The main storage analysis subroutine is given control when the machine check interruption was caused by a multiple-bit storage error. An
initial function points the machine check new PSi to an internal
subroutine to indicate a solid machine check, in case a machine check
interruption occurs while exercising main storage.
Damaged storage areas associated with any portion of the CP nucleus
itself cannot be refreshed; multiple-bit storage errors in CP cause the V8/370 system to be terminated. An automatic restart reinitializes V8/370. If the damage is not in the CP nucleus, main storage is exercised to determine if the failure is solid or intermittent. ftultiple-bit ECC
storage errors on a 3031, 3032, or 3033 processor are always treated as
solid errors. If the failure is solid, the 4K page frame is marked
unavailable for use by the system. If the failure is intermittent, the
page frame is marked invalid. The change bits associated with the
damaged page frame are checked to determine if the page had been
altered, by the virtual machine. If no alteration had occurred, Vft/370 assigns a new page frame to the virtual machine and a backup copy of the
page is brought into storage the next time the page is referenced. If
the page had been altered V8/370 resets or terminates the virtual
.achine, clears its virtual storage, and sends an appropriate message to
the user. Nor.al system operation continues for all other users.
The SPP analysis subroutine is given control when the machine check
interruption was caused by an SPF error. An initial function points the machine check new PSi to an internal subroutine if a machine check
interrruption occurs during testing and validation. The SPF analysis
routine then determines if the error was associated with a failure in
virtual machine storage or in the storage associated with the control
program.
An SPF error associated with VK/370 is a potentially catastrophic
failure. Namely, Vft/370 always runs with a PSi key of zero, which means that the SPF key in main storage is not checked for an out-of-parity
condition. The SPF analysis subroutine exercises all 16 keys in the
failing storage 2K page frame. If an SPP machine check occurs in
exercising the 16 keys 5 times each, the error is considered solid and
the operating system is terainated with a system shutdown. If an SPF machine check does not occur, the machine check is considered
intermittent. The zero key is restored to the failing 2K page frame and
this is transparent to the virtual machine. If an SPF machine check occurs, which is associated with a virtual machine, the SPF analysis subroutine exercises all 16 keys in the
failing storage 2K page frame. If an SPF machine check does not occur,
the aachine check is intermittent and the SWPTABLE for the page
associated with the failing storage address is located. The storage key
for the failing 2K storage page frame is retrieved from the SWPTIBLE and
the change and reference bits are set on in the storage key. The
storage key is then stored into the affected failing storage 2K page
fraae. If an SPF machine check occurs in exercising the 16 keys 5 times each, then the aachine check is considered solid and the following
actions are taken. (1) The virtual machine is selectively reset or
terminated by the virtual machine termination subroutine; (2) The 4K
page fraae associated with the failing address is removed as an
1-154 IBM VM/370 System Logic and Problem Determination--Volume 1
available system CORTABLE for the
pointers to make in this CORTABLE
page in a system
resource. This is accoaplished by locating the
defective page and altering the CORFPNT and CORPBPIT the page unavailable to the systea. The CORDISl bit
is set on to identify the reason for the status of this
d Ullp. The recovery facility mode switching subroutine (DftKftCHftS) allows the
service representative to change the mode that processor retry and ECC
recording are operating This subroutine receives control when a
user with privilege class F issues some fora of the SET co •• and with the MODE operand. A check is initially made to determine if this is V!/370 running under VM/370. If this is the case, the request is ignored and
control is returned to the calling routine. For the format and usage of
the SET command with the ftODE operand, refer to the !M/370 Q2erator's The operator communciation subroutine is invoked when the integrity of
the system has degraded to a point where autoaatic shutdown and reload
of the system has been tried and was unsuccessful, or could not be
attempted due to the severity of the hardware failure. 1 check is first made to determine if the system operator is logged on as a user, next a
check is made to determine if the syste. operator is disconnected. If
either of these checks is not affirmative a aessage cannot be issued
directly to the systea operator. A LPSW is performed to place the
processor in a disabled wait state with a recognizable wait state code
in the processor instruction counter.
The virtual machine termination subroutine selectively resets or terminates a virtual user whose operation has been interrupted by an
uncorrectable machine check. First, the aachine is marked nondispatchable to prevent the damaged machine from running before reset
or termination is performed. The machine check record is formatted and DftKIOEftC is called to record the error. Then the user is notified by a
call to DftKQCNWT that a machine check has occurred and that his
operation is terminated. The primary system operator is notified of the
virtual user termination by a message issued by a call to D!KQCIWT. If
the virtual .achine is running in the virtual=real area, D!KUSO is
called to log the virtual machine off the systea and to return the
storage previously allocated to the virtual .achine and to clear any
outstanding virtual machine I/O requests. The HOLD option of LOGOFF is
invoked to allow a user on a dial facility to retain the connection and
thus permit LOGON without re-establishing the line connection. However,
if the virtual machine is running in the virtual area, and D!KCF! is
then called to put the virtual machine in console function .ode, the
user must re-initialize the system to co.mence operation. CP Introduction 1-155
Previous Page Next Page