CHAPTER 11. MACHINE-CHECK HANDLING Machine-Check Detection •••••••••••••••.•••••.••••••••••••• 11-2
Correction of Machine Malfunctions ••••••.•.••••••••.•....• 11-2
Error Checking and Correction •••••••••••.•..••••.•••.••• 11-2 CPU Retry ••.....••.•••••..••••••••••••••.••••••••..•.••• 11-3
Effects of CPU Retry .....•.••..•••...•••••••••••••••.• 11-3 Checkpoint Synchronization ...•...............•........ 11-3
Handling of Machine Checks during Checkpoint Synchronization .•.•••.•••.••..•...•.••••••••••••••••. 11-3
Checkpoint-Synchronization Operations ••••••••.•••••.•. 11-4
Checkpoint-Synchronization Action •.•...•••••••.••.•.•. 11-4
Unit Deletion ....•••.•.••••••••••••..•..••.••••••••.••.. 11-4
Handling of Machine Checks .••••.••.•••.•.•••.••.•••.•••••. 11-5
Validation ......••...•.•••••..••..••.•..•.•••••••••.•.•• 11-5
Invalid CBC in Storage .•.••••••..•.....•..•..••.••.•.•.. 11-6
Programmed Validation of Storage .•....•...•..••.••.... 11-6
Invalid CBC in Storage Keys ....•••.•...•..••••••.••••••. 11-7
Invalid CBC in Registers .••••.••••..•.••••..••.•••...••• 11-9 Check-Stop State .....•.•••••.....•...•••.•••..••••••..•.•. 11-10
System Check stop .......••.........•.•.•.••••••.....•. 11-11
Machine-Check Interruption •••••..•••••••••.••.•••••••••.•. 11-11
Exigent Conditions ••.•••••.•..••.••.••••....•••••.•.•••. 11-11
Repressible Conditions •••.••...•.••.•••••••.••••..•.•.•. 11-12
Interruption Action •.••••.......•••.••••••••••••..••..•. 11-12
Point of Interruption •••.•.•••••••••••••••..••.•.••••••• 11-14
Machine-Check-Interruption Code •..••••••••••.•.••••..•..•. 11-15
Subclass ..•.•....•..••••...••••.••••••••••••••••••..•••. 11-16
System Damage .••....••.......•.•••.••.•.•..•....•..••. 11-16
Instruction-Processing Damage •••..••.••••.•..•........ 11-17
System Recovery •••.•....•..•••••.••••••••••.•••••••••. 11-17
Interval-Timer Damage •••••.••••••••••••••••.•••••••••• 11-17
Timing-Facility Damage ....................••....•••.•. 11-17
External Damage •••...•...••••••••..••.•..•••••..•.••.. 11-18
Vector-Facility Failure •••.•••••••••.••••••••••••••••. 11-18
Degradation •..••.••••.••.•••.•••••••••.•.•••••.••••••. 11-18
Warning ••.•....•.....••..•.•••••.•••••••••••••.••••••. 11-18
Service-Processor Damage ••.•••••••••••••••••••...••..• 11-18
Subclass Modifiers ••.•••.•••..•••••••••••••••••••••.•.•• 11-18
Vector-Facility Source ••••••..••...••••••••••••.•••••• 11-19
Backed Up .......••..••..•.•...••.•••.•••••••••.••••.•. 11-19
Delayed •........••.•.•...........•.•....••••......•.•. 11-19
Delayed Access Exception ..•••••••••.•...•.••..•.•.•••. 11-19
Synchronous Machine-Check-Interruption Conditions ••.•••. 11-19
Processing Backup •.•••.•••..••.•••••..••••••.••••.•.•• 11-19
Processing Damage •....•••.......•.•.•••••.••••••••••.• 11-20
Storage Errors .....•.••.•••...••.•..•.•••••.••••••••...• 11-20
Storage Error Uncorrected •••.••••••••••••••••••••••••• 11-20
Storage Error Corrected •••••••••••••••••••••.•.••••••. 11-20 Storage-Key Error Uncorrected •••••••••••••••••....•••• 11-21
Storage Degradation •.••••...•.•••.•.•••••.••••••...••• 11-21
Indirect Storage Error •••••••••••••••••••••••••••••••• 11-21 Machine-Check Interruption-Code Validity Bits ••••••••••• 11-21
PSW-EMWP Validity •.••••••••••••.•••••••••••••••••••••• 11-22
PSW Mask and Key Validity •...•••••••.•..••••••••...••. 11-22 PSW Program-Mask and Condition-Code Validity •••••••.•• 11-22
PSW-Instruction-Address Validity •••••••••••••••••••••• 11-22
Failing-Storage-Address Validity •.•.••••••••••.•.••••. 11-22 Region-Code Validity •••••••.••••..•••••••••••••••••••• 11-22
External-Damage-Code Validity ••••••••••••••••••••.•••• 11-22
Floating-Point-Register Validity •••••••••••••••••••••• 11-22
General-Register Validity ••••••••••••••••••••••••••.•• 11-23
Control-Register Validity •..••••••••••.•••••••••..•.•• 11-23
Logout Validity •••.••••••••••••••••••••••••••••••••••• 11-23
Storage Logical Validity •••••••••••••••••••••••••••••• 11-23 CPU-Timer Validity •.•.•••••••••••••••••••••••••••••••• 11-23 Clock-Comparator Validity ••••••••••••••••••••••••••••• 11-23
Machine-Check Extended-Logout Length •••••••••••••••••• 11-23 Machine-Check Extended Interruption Information ••••••••••• 11-24
Register-Save Areas ••••••••••••••••••••••••••••••••••••• 11-24 Chapter 11. Machine-Check Handling 11-1
External-Damage Code •••••••••••••••••••••••••••••••••••• 11-24
Failing-Storage Address ••••••••.••.••••••.•••••.•••••••. 11-26
Region Code .•.•••.•••••••.•.•••••••••••••••••••••••••••. 11-26
Handling of Machine-Check Conditions •••••••••••••••••••••. 11-27
Floating Interruption Conditions .•.•.••••••.•.•••••••••• 11-27
Floating Machine-Check-Interruption Conditions .•.••••. 11-27
Machine-Check Masking ••••••••.••••••.••••••••.•.•.••••.••. 11-27 Check-Stop Control •.•.•.•.•••.•••.••••.••••....••••••• 11-28
Recovery Subclass Mask •••.••.•.•.•..•••••••.•••.•.•••. 11-28
Degradation Subclass Mask ..••••...•.•••.•.•••••.••..•. 11-28
External-Damage Subclass Mask ••••••••••••••••••••••••. 11-28
Warning Subclass Mask ••••••••••••••••••••••••.••...••. 11-28 Machine-Check Logout •.•.••..•..•••••••.•••••.••••••••..••• 11-28
Logout Controls ..•.......•..•.••.•...••••••.•...•..•••.. 11-29
Synchronous Machine-Check Extended-logout Control ....• 11-29
Input/Output Extended-Logout Control •••••.••.••.•.•••• 11-29
Asynchronous Machine-Check Extended-Logout Control •.....•.•.......•....•......•.•••••••••.•.•••. 11-29
Asynchronous Fixed-logout Control •••••.••.••••...••.•. 11-29
Machine-Check Extended-logout Address .••.••••••.•••.•••• 11-29
Summary of Machine-Check Masking and logout •.•••••..••.•.• 11-30
The machine-check-handling mechanism
provides extensive equipment-malfunction
detection to ensure the integrity of
system operation and to permit automatic
recovery from some malfunctions. Equip­
ment malfunctions and certain external
disturbances are reported by means of a
machine-check interruption to assist in
program-damage assessment and recovery.
The interruption supplies the program
with information about the extent of the
damage and the location and nature of
the cause. Equipment malfunctions,
errors, and other situations which can
cause machine-check interruptions are
referred to as machine checks. MACHINE-CHECK DETECTION Machi ne-check-detecti on mechanisms may
take many forms, especially in control
functions for arithmetic and logical
processing, addressing, sequencing, and
execution. For program-addressable
information, detection is normally
accomplished by encoding redundancy into
the information in such a manner that
most failures in the retention or trans­
mission of the information result in an
invalid code. The encoding normally
takes the form of one or more redundant
bits, called check bits, appended to a
group of data bits. Such a group of
data bits and the associated check bits
are called a checking block. The size of the checking block depends on the
model.
The inclusion of a single check bit in
the checking block allows the detection
of any single-bit failure within the
checking block. In this arrangement,
the check bit is sometimes referred to
as a "parity bit." In other arrange­
ments, a group of check bits is included
11-2 System/370 Principles of Operation
to permit detection of multiple errors,
to permit error correction, or both.
For checking purposes, the contents of
the entire checking block, including the
redundancy, are called the checking­
block code (CSC). When a CSC completelY meets the checking requirements (that is, no failure is detected), it is said
to be valid. When both detection and
correction are provided and a CSC is not
valid but satisfies the checking
requirements for correction (the failure
is correctable), it is said to be near­
valid. When a CSC does not satisfy the
checking requirements (the failure is
uncorrectable), it is said to be invalid. CORRECTION OF MACHINE MALFUNCTIONS Three mechanisms may be used to provide
recovery from machine-detected malfunc­
tions: error checking and correction, CPU retry, and unit deletion.
Machine failures which are corrected
successfully mayor may not be reported
as machine-check interruptions. If
reported, they are system-recovery
conditions, which permit the program to
note the cause of CPU delay and to keep a log of such incidents. ERROR CHECKING AND CORRECTION When sufficient redundancy is included
in circuitry or in a checking block,
failures can be corrected. For example,
circuitry can be triplicated, with a
voting circuit to determine the correct
value by selecting two matching results
Previous Page Next Page