Check-Stop State
In certain situations it is impossible or undesirable to
continue operation when a machine error occurs. In
these cases, the CPU may enter the check-stop state.
When the CPU is in the check-stop state, the
condition is indicated by an error indicator, an audi­
ble signal, or both. The system indicator is off, but
the state of the manual indicator depends on the
model. The exact indication of check-stop state is
model-dependent and is described in the System Library (SL) publication for the CPU. The machine enters the check-stop, state only as a
result of exigent conditions. The machine may be
removed from the check-stop state by CPU reset.
When the CPU is in the check-stop state, instruc­
tions and interruptions are not executed. The inter­
val timer is not updated, and channel operations may
be suspended. The TOD clock is not normally affect­
ed by check-stop state. The CPU timer mayor may
not run-in check-stop state, depending on the error
and the model. The CPU cluster meter does not run,
and the clock-out and metering-out lines are down.
The stop key and start key are not operative during
this state.
In a multiprocessing system, a CPU entering the
check-stop state generates a request for a
malfunction-alert external interruption to all CPUs configured to this CPU. Machine-Check Interruption
Conditions
Equipment malfunctions and other conditions re­
sponsible for machine-check interruptions are re­
ferred to as machine-check interruption conditions.
Two major types of conditions are identified: exigent
conditions and repressible conditions.
Repressible Conditions
Repressible conditions are those in which the se­
quential processing capability of the CPU has not
been affected. Repressible conditions can be delayed
until the completion of the current instruction, and
in most cases, even longer, without affecting the
integrity of the CPU operation. Repressible condi­
tions are of three types: recovery, alert, and repress'­
ible damage. Each has one or more subclasses as
follows:
A hardware malfunction successfully corrected or
circumvented without loss of system integrity is
called a recovery condition. Depending on the model
and the type of malfunction, some recovery condi­
tions may be discarded and not reported. Recovery
conditions that are reported are grouped in one sub­
class, system recovery.
Page of GA22-70004 Revised September 1, 1975
By TNL: GN22-0498
A machine-check interruption condition not di­
rectly related to a hardware malfunction is called an
alert condition. The alert conditions contain two
subclasses: degradation and warning.
A hardware malfunction resulting in the loss of
integrity of a process in .the system but not directly
affecting the sequential CPU operation is called a
repressible damage condition. Repressible damage
conditions are divided into three subclasses, identify­
ing the process affected: timer damage, timing­
facility damage, and external damage.
Exigent Conditions
Exigent conditions are those in which direct damage
has occurred to the CPU operation, and the current
instruction or interruption cannot safely continue.
Exigent conditions are divided into two subclasses:
instruction-processing damage, and system damage.
Malfunctions which cannot be isolated to a specific
process are indicated as system damage.
Machine-Check Interruption
The machine-check interruption provides a means of
reporting equipment malfunction and certain exter­
nal disturbances, and it supplies the program with
information about the extent of the resultant damage
and the location and nature of the cause.
Interruption Action
A machine-check interruption causes the PSW re­
flecting the point of interruption to be stored as the
machine-check old PSW at location 48; extended
machine-check interruption information is stored,
consisting of the information in all the control regis­
ters, general registers, floating-point registers, CPU timer, clock comparator, a region code, and a failing
storage address. Then the machine-check interrup­
tion code (MCIC) of eight bytes is stored. A new PSW is fetched from location 112. Additionally,
sometime before the storing of the machine-check
interruption code, one or several machine-check
logouts may have occurred. The machine-generated
addresses to reference the old and new PSW, the
interruption code and extended interruption informa­
tion, and the fixed logout area are all real addresses.
The extended machine-check logout address is also a
real address. If the machine-check interruption code
cannot be stored successfully or the new PSW can­
not be fetChed successfully, the CPU enters the
check-stop state if the check-stop control bit is one.
A machine-check interruption due to a repressible
machine-check condition can occur only when both PSW bit 13 and the associated subclass mask are
ones. A repressible machine-check interruption does
not terminate the execution of the current instruc-
Machine-Check Handling 175
tion; the interruption is taken after the execution of
the current instruction has come to its normal ending
and the associated program or supervisor-call inter­
ruption, if any, has been taken. No program or
supervisor-call interruptions are If the
repressible machine-check condition occurs during
the execution of a system function such as a timer
update, the machine-check interruption takes place
after the system function has been completed.
A machine-check interruption due to an exigent
machine-check condition can occur only when PSW bit 13 is one. The interruption terminates the execu­
tion of the current instruction and may eliminate the
program and supervisor-call interruptions, if any,
that would have occurred as a result of continuing
execution of the instruction. Proper execution of the
interruption steps, including the storing of the old PSW and diagnostic information, depends on the
nature of the malfunction. When an exigent
machine-check condition occurs during the execu­
tion of a system function, such as a timer update, the
sequence is not necessarily completed.
When PSW bit 13 is zero and an exigent machine­
check condition is generated, subsequent action de­
pends on the state of the check-stop control bit, bit
o of control register 14. When the check-stop con­
trol bit is zero, the machine-check condition is held
pending, and an attempt is made to complete the
execution of the current instruction and to proceed
with the next sequential instruction. When the
check-stop control bit is one, processing stops imme­
diately, and the CPU enters the check-stop state.
Depending on the model and the severity of the er­
ror, the CPU may enter the check-stop state even
when the check-stop control bit is zero.
Similarly, if, during the execution of an interrup­
tion due to one exigent machine-check condition,
another exigent machine-check condition is detected,
subsequent action depends on the state of the check­
stop control bit. If the check-stop control bit is one,
the CPU enters the check-stop state; if the bit is
zero, an attempt is made to proceed with the condi­
tion held pending for subsequent interruption. If an
exigent machine-check condition is detected during
an interruption due to a repressible machine-check
condition, system damage is also reported.
Exigent machine-check conditions held pending
while the check-stop control bit is zero remain pend­
ing and do not cause the CPU to enter the check­
stop state if the check-stop control bit is subsequent­
ly set to one.
If a repressible machine-check condition is detect­
ed with the CPU disabled for the associated machine-check interruption condition, the condition
is held pending. If a system-recovery condition is
176 System/370 Principles of Operation detected during the execution of the interruption
procedure due to a previous machine-check condi­
tion, the system-recovery condition may be com­
bined with the other conditions, discarded, or held
pending. The CPU never enters the check-stop state
because of a repressible machine-check condition. Only one machine-check interruption condition is
held pending for each subclass, regardless of the
number of conditions that may have been detected.
Machine-check interruptions can be initiated only
by an interruption condition in a subclass for which
the CPU is enabled. Conditions in other subclasses
which are pending may also be indicated in the same
interruption even though the CPU is not enabled for
those subclasses. All conditions which are indicated
are then cleared.
Machine-check interruption conditions are han­
dled in the same manner in both the running and
wait states. In the wait state, a machine-check inter­
ruption condition for which the CPU is enabled
causes an immediate interruption.
Machine checks which occur while processing is
in the instruction-step mode are handled in the same
manner as in process mode; that is, normal recovery,
logout, and machine-check interruptions occur when
allowed. Machine checks occurring during a manual
operation such as system reset, set Ie, or store, may
generate a system-recovery condition. If damage has
been caused which is not corrected or not circum­
vented, the CPU enters the check-stop state.
Every reasonable attempt is made to limit the
side-effects of any machine-check condition and the
associated interruption. Normally, I/O and external
interruptions, as well as the progress of I/O data
transfer and the updating of the timer, remain unaf­
fected. The malfunction, however, may affect these
activities, and, if the currently active PSW has bit 13
set to one, the machine-check interruption may ter­
minate the process of switching PSW s that is due to
another type of interruption. In these cases, system
damage will be indicated.
Point of Interruption
Because of the checkpoint capability in models with
machine retry, the interruption resulting from an
exigent machine-check interruption condition may
indicate a point in the recovery cycle which is prior
to the error. Additionally, the model may have some
choice as to which point in the recovery cycle the
interruption will indicate, and, in some cases, the
status which can be marked as valid depends on the
point chosen.
The point in the processing which is indicated by
the interruption and used as a reference point by the
machine to determine and the validity of the
Previous Page Next Page