HANDLING OF MACHINE CHECKS A machine check is caused by a machine
malfunction and not by data or
instructions. This is ensured during
the power-on sequence by initializing
the machine controls to a valid state
and by placing valid CBC in the CPU registers, in the storage keys, and, if
it is volatile, also in main storage.
Designation of an unavailable component,
such as a storage unit, channel, or I/O device, does not cause a machine-check
indication. Instead, such a condition
is indicated by the appropriate program
or I/O interruption or condition-code
setting. In particular, an attempt to
access a storage location which is not
in the configuration, or which has power
off at the storage unit, results in an
addressing exception when detected by
the CPU and does not generate a
machine-check condition, even though the
storage location or its associated stor­
age key has invalid CBC. Similarly, if
the channel attempts to access such a
location, an I/O-interruption condition
indicating program check is generated
rather than a machine-check condition.
A machine check is indicated whenever
the result of an operation could be
affected by information with invalid CBC, or when any other malfunction makes
it impossible to establish reliably that
an operation can be, or has been,
performed correctly. When information
with invalid CBC is fetched but not
used, the condition mayor may not be
indicated, and the invalid CBC is
preserved.
When a machine malfunction is detected,
the action taken depends on the model,
the nature of the malfunction, and the
situation in which the malfunction
occurs. Malfunctions affecting
operator-facility actions may result in machine checks or may be indicated to
the operator. Malfunctions affecting
certain other operations such as SIGNAL PROCESSOR may be indicated by means of a
condition code or may result in a
machine-check-interruption condition.
A malfunction detected as part of an I/O operation may cause a machine-check­
interruption condition, an I/O-error
condition, or both. I/O-error condi­
tions are indicated by an I/O inter-
ruption or by the appropriate
condition-code setting during the
execution of an I/O instruction. When
the machine reports a failing-storage
location detected during an I/O opera­
tion, both I/O-error and machine-check
conditions may be indicated. The 1/0- error condition is the primary
indication to the program. The
machine-check condition 1S a secondary
indication, which is presented as system
recovery or as an external secondary
report, together with a failing-storage
address.
VALIDATION
Machine errors can be generally classi­
fied as solid or intermittent, according
to the persistence of the malfunction.
A persistent machine error is said to be
solid, and one that is not persistent is
said to be intermittent. In the case of
a register or storage location, a third
type of error must be considered, called
externally generated. An externally
generated error is one where no failure
exists in the register or storage
location but invalid CBC has been intro­
duced into the location by actions
external to the location. For example,
the value could be affected by a power
transient, or an incorrect value may
have been introduced when the informa­
tion was placed at the location.
Invalid CBC is preserved as invalid when
information with invalid CBC is fetched
or when an attempt is made to update
only a portion of the checking block.
When an attempt is made to replace the
contents of the entire checking block
and the block contains invalid CBC, it
depends on the operation and the model
whether the block remains with invalid CBC or is replaced. An operation which
replaces the contents of a checking
block with valid CBC, while ignoring the
current contents, is called a validation
operation. Validation is used to place
a valid CBC in a register or at a
location which has an intermittent or
externally generated error.
Validating a checking block does not
ensure that a valid CBC will be observed
the next time the checking block is
accessed. If the failure is solid,
validation is effective only if the
information placed in the checking block
is such that the failing bits are set to
the value to which they fail. If an
attempt is made to set the bits to the
state opposite to that in which they
fail, then the validation will not be
effective. Thus, for a solid failure,
validation is only useful to eliminate
the error condition, even though the
underlying failure remains, thereby
reducing the exposure to additional
reports. The locations, however, cannot
be used, since invalid CBC will result
from attempts to store other values at
the location. For an intermittent fail­
ure, however, validation is useful to
restore a valid CBC such that a subse­
quent partial store into the checking
block will be permitted. (A partial
store is a store into a checking block
without replacing the entire checking
block.) Chapter 11. Machine-Check Handling 11-5
When a checking block consists of multi­
ple bytes in storage, or multiple bits
in CPU registers, the invalid CBC can be
made valid only when all of the bytes or
bits are replaced simultaneously.
For each type of field in the system,
certain instructions are defined to
validate the field. Depending on the
model, additional instructions may also
perform validation; or, in models, a register is automatically validated as
part of the machine-check-interruption
sequence after the original contents of
the register are placed in the appropri­
ate save area.
When an error occurs in a checking
block, the original information
contained in the checking block should
be considered lost even after
validation. Automatic register vali­
dation leaves the contents
unpredictable. Programmed and manual
validation of checking blocks causes the
contents to be changed explicitly.
Programming Note The machine-check-interruption handler
must assume that the registers require
validation. Thus, each register should
be loaded, using an instruction defined
to validate, before the register is used
or stored.
INVALID CBC IN STORAGE
The size of the checking block in stor­
age depends on the model but is never
more than 2K bytes.
When invalid CBC is detected in storage,
a machine-check condition may occur;
depending on the circumstances, the
machine-check condition may be system
damage, instruction-processing damage,
external damage, or system recovery. If
the invalid CBC is detected as part of
the execution of a channel program, the
error is normally reported as an
I/O-error condition. When a CCW, indirect-data-address word, or data is
prefetched from storage, is found to
have invalid CBC, but is not used in the
channel program, the condition is
normally not reported as an I/O-error
condition. The condition mayor may not
be reported as a machine-check­
interruption condition. Invalid CBC
detected during accesses to storage for
other than CPU-related accesses may be
reported as system recovery with storage
error uncorrected indicated, or as
external secondary report, since the
primary error indication is reported by
some other means.
11-6 System/370 Principles of Operation
When the storage checking block consists
of multiple bytes and contains invalid CBC, special storage-validation proce­
dures are generally necessary to restore
or place new information in the checking
block. Validation of storage is
provided with the manual load-clear and
system-reset-clear operations and may
also be provided as a program function.
Manual storage validation by clear reset
validates all blocks which are available
in the configuration.
A checking block with invalid CBC is
never validated unless the entire
contents of the checking block are
replaced. An attempt to store into a
checking block having invalid CBC, with­
out replacing the entire checking block,
leaves the data in the checking block
(including the check bits) unchanged.
Even when an instruction or a channel
program input operation specifies that
the entire contents of a checking block
are to be replaced, validation mayor
may not occur, depending on the opera­
tion and the model.
Programming Note
Machine-check conditions may be reported
for prefetched and unused data. Depend­
ing on the model, such situations may,
or may not, be successfully retried.
For example, a BRANCH AND LINK (BALR)
instruction which specifies an R2 field
of zero will never branch, but on some
models a prefetch of the location desig­
nated by register zero may occur.
Access exceptions associated with this
prefetch will not be reported. However,
if an invalid checking-block code is
detected, CPU retry may be attempted.
Depending on the model, the prefetch may
recur as part of the retry, and thus the
retry will not be successful. Even when
the CPU retry is successful, the
performance degradation of such a retry
is significant, and system recovery may
be presented, normally with a failing­
storage address. To avoid continued
degradation, the program should initiate
proceedings to eliminate use of the
location and to validate the location.
Programmed Validation of Storage
Provided that an invalid CBC does not
exist in the storage key associated with
a 4K-byte block, the instruction TEST BLOCK causes the entire 4K-byte block to
be set to zeros with a valid CBC, regardless of the current contents of
the storage. TEST BLOCK thus removes an
invalid CBC from a location in storage
which has an intermittent, or one-time,
failure. However, if a permanent fail­
ure exists in a portion of the storage,
Previous Page Next Page