HANDLING OF MACHINE-CHECK CONDITIONS FLOATING INTERRUPTION CONDITIONS An interruption condition which is made
available to any CPU in a multiprocess­
ing configuration is called a floating
interruption condition. The first CPU that accepts the interruption clears the
interruption condition, and it is no
longer available to any other CPU in the
configuration.
The service-signal external-interruption condition is a floating interruption
condition. Depending on the model, some
machine-check-interruption conditions
associated with system recovery, warning, and external secondary report
may be floating interruption conditions.
A floating interruption is presented to
the first CPU in the configuration which is enabled for the interruption condi­
tion and can accept the interruption. A CPU cannot accept the interruption when
it is in the check-stop state, has an
invalid prefix, is performing an unend­
ing string of interruptions due to a PSW-format error of the type that is
recognized early, is executing a READ DIRECT instruction, or is in the stopped
state. However, a CPU with the rate
control set to instruction step can
accept the interruption when the start
key is activated.
Programming Note
When a CPU enters the check-stop state
in a multiprocessing configuration, the
program on another CPU can determine
whether a floating interruption may have
been reported to the failing CPU and
then lost. This can be accomplished if
the interruption program places zeros in
the real storage locations containing
old PSWs and interruption codes after
the interruption has been handled (or
has been moved into another area for
later processing). After a CPU enters
the check-stop state, the program in
another CPU can inspect the old-PSW and
interruption-code locations of the fail­
ing CPU. A nonzero value in an old PSW or interruption code indicates that the CPU has been interrupted but the program
did not complete the handling of the
interruption. Floating Machine-Check-Interruption Conditions Floating machine-check-interruption con­ ditions are reset only by the manually
initiated resets through the operator
facilities. When a machine check occurs
which prohibits completion of a floating
machine-check interruption, the inter­
ruption condition is no longer consid­
ered a floating interruption condition,
and system damage is indicated. MACHINE-CHECK MASKING
All machine-check interruptions are
under control of the machine-check mask,
PSW bit 13. In addition, some machine­
check conditions are controlled by
subclass masks in control register 14.
The exigent machine-check conditions
(system damage and instruction­
processing damage) are controlled only
by the machine-check mask, PSW bit 13.
When PSW bit 13 is one, an exigent
condition causes a machine-check inter­
ruption. When PSW bit 13 is zero and
the check-stop-control bit, bit 0 of
control register 14, is one, the occur­
rence of an exigent machine-check
condition causes the CPU to enter the
check-stop state. When PSW bit 13 is zero and the check-stop-control bit is zero, the machine may attempt to contin­
ue or may enter the check-stop state
depending on the type of error.
The repressible machine-check condi­
tions, except vector-facility failure
and service-processor damage, are
controlled both by the machine-check mask, PSW bit 13, and by four subclass­
mask bits in control register 14. If PSW bit 13 is one and one of the
subclass-mask bits is one, the associ­
ated condition initiates a machine-check
interruption. If a subclass-mask bit is zero, the associated condition does not
initiate an interruption but is held
pending. However, when a machine-check
interruption is initiated because of a
condition for which the CPU is enabled,
those conditions for which the CPU is
not enabled may be presented along with
the condition which initiates the inter­
ruption. All conditions presented are
then cleared. Control register 14 contains mask bits that specify whether certain conditions
can cause machine-check interruptions;
it has the following format:
o 1 4 7 Chapter 11. Machine-Check Handling 11-27
With the exception of bit 0, which is
provided on all models, each of the bits
is necessarily provided only if the
associated function is provided.
Programming Note The program should avoid, whenever
possible, operating with PSW bit 13, the
machine-check mask, set to zero, since any exigent machine-check condition
which is recognized during this situ­
ation may cause the CPU to enter the
check-stop state. In particular, the
program should avoid executing I/O instructions or allowing I/O inter­
ruptions with PSW bit 13 zero. Check-Stop Control Bit 0 (CS) of control register 14,
controls the system action taken when an
exigent machine-check condition occurs
under one of the following two condi­
tions:
1. The CPU is disabled for
check interruptions (that
bit 13 i s zero).
machine­
iS 1 PSW
2. An exigent machine-check condition
occurs during the process of stor­
ing the machine-cheek-interruption
code, storing the machine-check old PSW, or fetching the machine-check
new PSW during a machine-check
interruption.
If the check-stop-control bit is one and
either condition occurs, the machine
enters the check-stop state; if the
check-stop-control bit is zero, the
machine may attempt to continue or may
enter the check-stop state, depending on
the type of error and the model. The
check-stop-control bit is initialized to
one. If damage occurs to control regis­
ter 14, the check-stop-control bit is
assumed to be one.
Recovery Subclass Mask
Bit 4 (RM) of control register 14 con­
trols system-recovery interruption con­
ditions. This bit is initialized to
zero.
Degradation Subclass Mask
Bit 5 (OM) of control register 14
controls degradation interruption condi-
11-28 System/370 Principles of Operation
tions. This bit is initialized to zero.
External-Damage Subclass Mask
Bit 6 (EM) of control register 14 con­
trols timing-facility-damage, interval­
timer-damage, and external-damage inter­
ruption conditions. This bit is
initialized to one.
Warning Subclass Mask
Bit 7 (WM) of control register 14
controls warning interruption condi­
tions. This bit is initialized to zero. MACHINE-CHECK LOGOUT Some models place model-dependent infor­
mation in main storage as a result of a
machine check. This is referred to as a
machine-check logout. Machine-check
logouts are of four different types:
synchronous fixed logout, asynchronous
fixed logout, synchronous machine-check
extended logout, and asynchronous
machine-check extended logout.
Machine-cheek-logout information may,
depending on the model, be placed in the
machine-check extended-logout (MCEl) area. The starting real location of the MCEl area is designated by the contents
of control register 15. The existence
and length of the MCEl are model­
dependent.
Some models may place model-dependent
information in the fixed-logout area.
This area is 96 bytes in length and
starts at real location 256. The fixed
logout may be in addition to or instead
of an extended logout. When a machine-check logout occurs
during the machine-check interruption,
it is called a synchronous logout. If a
machine-check logout occurs without a
machine-check interruption, or if the
logout and the interruption are sepa­
rated by instruction processing or by CPU retry, then the logout is called an
asynchronous logout.
To preserve the initial machine-check
conditions, some models perform an asyn­
chronous logout before invoking CPU retry. Depending on the model, logout
may occur before recovery, after recov­
ery, or at both times. If logout occurs
at both times, it may be into the same
portion or two different portions of the
logout area.
Previous Page Next Page