status stored is referred to as the "point of interrup­ tion." Only certain points in the processing may be used
as a point of interruption. For repressible machine
checks the point of interruption must be after one
unit of operation is completed, including the associ­
ated program or supervisor-call interruption, if appli­
cable, and before the next unit of operation is begun.
Exigent machine-check conditions which are de­
layed (disallowed and presented later when allowed)
can occur only at the same points of interruption as
repressible machine-check conditions. When an exi­
gent machine-check condition is not delayed, the
point of interruption may also be after the unit of
operation is completed but before the associated
program or supervisor-call interruption occurs. In
this case, a valid PSW is defined as that which would
have been stored in the old PSW for the program or
supervisor-call interruption. Even though all status
may be indicated as valid, damage has occurred be­
cause the associated interruption is lost.
Programming Note
When an exigent machine-check condition occurs,
the point of interruption which is chosen affects the
amount of damage which must be indicated. An at­
tempt is made, when possible, to choose a point of
interruption which permits the minimum indication
of damage. In general, the preference is the interrup­
tion point immediately preceding the error. When a
point of interruption is chosen which is after an as­
sociated program or supervisor-call interruption, the
damage has not been isolated to a particular pro­
gram, and system damage is indicated.
When all the status information stored as a result
of an exigent machine-check condition does not
reflect the same point, an attempt is made when
possible to choose the point of interruption so that
the instruction address which is stored in the
machine-check old PSW is valid. Alachine-Check Logout
The storing of model-dependent information in main
storage as a result of a machine check is referred to
as a machine-check logout. Machine-check logouts
are of four different types: synchronous fixed logout,
asynchronous fixed logout, synchronous machine­
check extended logout, and asynchronous machine­
check extended logout.
When a machine-check logout occurs during the
machine-check interruption it is called "h "If h' sync ronous. a mac me-check logout occurs
without a machine-check interruption, or if the log­
out and the interruption are separated by instruction
processing or by instruction retry, then the logout is
called "asynchronous." Machine-cheek-logout information can be placed
in either or both of two areas. One area, the 96-byte
area starting at location 256, is called the "fixed­ logout area." Additionally, a machine-check
extended-logout area (MCEL) is defined. The start­
ing location of the MCEL area is specified by the
contents of control register 15. The existence and
length of the machine-check extended logout are
model-dependent.
To preserve the initial machine-check conditions,
some models perform an asynchronous logout before
invoking automatic CPU recovery action. Depending
on the model, logout may occur before recovery,
after recovery, or at both times. If logout occurs at
both times it may be into the same portion or two
different portions of the logout area. Alachine-Check Extended Interruption
Information
The machine-check extended interruption informa­
tion consists of seven fields, which are stored at
machine-check interruption time. Each of these
fields has a validity bit associated with it in the
machine-check interruption code. If for any reason
the machine cannot store one of these fields or can­
not store the field validly, the associated validity bit
is set to zero.
Timing Facilities: When the system-timing facilities
are present, any machine-check interruption causes
the contents of the clock comparator and CPU timer
to be placed in storage as part of the machine-check
extended interruption information. The contents of
the clock comparator are stored in the doubleword
starting at location 224. The contents of the CPU timer are placed in the double word starting at loca­
tion 216.
Fai/ing-Storage Address: When a storage error un­
corrected, storage error corrected, or key in storage
error uncorrected has been indicated, the failing­
storage address is stored in bits 8-31 of the word at
location 248. Bits 0-7 of the word are set to zeros.
In the case of storage errors, the failing-storage ad­
dress may point to any byte within the checking
block. For key in storage error uncorrected, the
failing-storage address may point to any address
within the 2,048-byte block of storage associated
with the key in storage that is in error. When an
error is detected in more than one location before
the interruption, the failing-storage address may
point to any of the failing locations. The address
stored is an absolute address; that is, the value
Machine-Check Handling 177
stored is the address that is used to reference: storage
after dynamic address translation and prefixing, if
any, have been applied.
Region Code: The word at location 252 contains
model-dependent information which more specifical­
ly defines the location of the error. For example, it
may contain a model-dependent address of the unit
causing an external damage or recovery report.
Register Save Area: On all machine-check interrup­
tions, the addressable registers are saved sequentially
in storage. Floating-point registers 0, 2, 4, and 6 are
stored starting at location 352; when the floating­
point feature is not installed, these locations are left
unchanged. General registers 0-15 are stored start­
ing at location 384, and control registers 0-15 are
stored starting at location 448. The information
stored for control-register positions not associated
with an installed feature is unpredictable.
Machijne-Check Interruption Code
The machine-check interruption code (MCIC) is an
eight-byte field starting at location 232 and has the
format shown in the illustration.
Bits in the machine-check interruption code which
are not assigned, or not implemented by a particular
model, are stored as zeros.
Subclass
Bits 0-5, 7, and 8 identify the machine-check condi­
tions causing the interruption. At least one bit will
be stored as a one in the subclass field. When multi­
ple errors have occurred, several bits may be set to
ones.
System Damage (SD): Bit 0, when one, indicates
that damage has occurred which cannot be isolated
to one or more of the less severe machine-check
damage subclasses.
Instruction Processing Damage (PD): Bit 1, when
one, indicates that a malfunction has been detected
in the processing of instructions. The exact meaning
of bit 1 depends on the setting of the backed-up bit,
bit 14.
When the backed-up bit is one, a valid instruction
address stored in the machine-check old PSW, and
the other machine status saved, point to the begin­
ning of a unit of operation prior to the point at
which the damage would have occurred. When the
backed-up bit is one and all status is indicated as
valid, the machine has successfully returned to a
checkpoint prior to the malfunction, and no damage
has yet occurred.
When the backed-up bit is zero, a valid instruc­
tion address points to the beginning of an instruction
containing a unit of operation beyond the damaged
unit of operation. For damage to be indicated as
instruction processing damage, the damaged instruc­
tion and the point of interruption must not be sepa­
rated by an interruption or by a LOAD PSW instruc­
tion, and the extent of the damage must fall within
one or more of the following categories:
1. The damaged area still contains invalid CBC.
2. The damaged area lies within tIle destination
operand of the instruction.
3. The damaged area lies within the general regis­
ters, floating-point registers, control registers,
or PSW. System Recovery (SR): Bit 2, when one, indicates
that malfunctions were detected but have been suc­
cessfully corrected or circumvented without the loss
of system integrity. CPU-detected malfunctions are
reported as system recovery only if the CPU success­
fully completes the operation or unit of operation in
which the malfunction was detected. Some 1/0- detected damage conditions may result in a system
recovery condition in addition to the 110 interrup­
tion. The indication of system recovery does not G 0 0 0 0 0 0 0 0 0 0 0 I I ...L.1 ___ M_a_ch_i_ne_-C_he_c_k_E_x_te_n_d_ed_L_09_O_u_t _L_en_g_th ___ --' Bits 0-5, 7, 8
Bits 14-15
Bits 16-18
Bits 20-31,46,47
Bits 6, 9-13,19,26,32-45 Subclass Time of interruption occurrence Storage errors Validity indicators
Not assigned, stored as zeros
Machine Check Interruption-Code Format
178 System/370 Principles of Operation
63
Previous Page Next Page