The soft recording subroutine performs two basic functions: Formats a machine check record and calls D!KIOE!C to record the error
on the error recording cylinder. Maintains the threshold for processor retry and ECC errors and
switches from recording to quiet mode when the threshold value is
exceeded. To accomplish this, a counter is maintained by D!K!CH for
successful processor retry and corrected Bee Recording mode (bit 4 of control
register 14 set to one) is the initialized state, and normal operating
state of VK/370 for processor retry errors. Recording mode may also be
entered by use of the CP SET command. When 12 soft machine checks have
occurred, the soft recording subroutine switches the processor from recording mode to quiet mode. For the purpose of model-independent implementation this is accomplished by setting bit 4 of control register
14 to zero.. Because in quiet mode no soft machine check interruptions
occur, a switch froa quiet mode to recording mode can be made by issuing
the SET MODE RETRYIKAIN RECORD command. While in recording mode, corrected CPU RETRYIKAIN reports are formatted and recorded on the V8/370 error recording cylinder, but the primary systems operator is not informed of these occurrences. Quig! Quiet mode (bit 4 of control register 14 set
to 0) can be entered in one of two ways: (1) when 12 soft machine checks have occurred, or (2) when the SET !ODE RETRY QUIET command is
executed by a class F user. In this mode, both processor retry and ECC
reporting are disabled. The processor remains in quiet mode until the
next system IPL (warm start or cold start) occurs or a SET !ODE RETRYI!AIN RECORD command is executed by a class F user. SET KODE 81IN is treated as invalid on a 3031, 3032, or 3033 processor.
ECC To achieve model-independent support, RKS does not
set a specific mode for ECC recording. The mode in which ECC recording
is initialized depends upon the hardware design for each specific
processor model. For the IBM System/370 Models 135, 135-3, 138, 145,
145-3, 148, 158, 168, 3031, 3032, and 3033, the hardware-initialized
state (therefore the normal operational state for V!/370) is quiet mode. For the IB! System/370 Models 155 II and 165 II, the hardware
initialized state (the normal operational state for V!/370) is record mode. An automatic restart incident due to a V8/370 failure does not
reset the ECC recording mode in effect at the time of failure.
The change from record to quiet mode for ECC recording can be
initiated in either of the following ways: (1) by issuing the SET KODE {MAINIRETRY} QUIET co.mand, or (2) automatically whenever 12 soft machine checks have occurred. For the purpose of model-independent
implementation, this occurs by setting bit 4 of control register 14 to
zero.
The change from quiet to record mode for ECC recording can be
accomplished by use of the SET KODE KAIN RECORD com.and. This recording mode option is for use by maintenance personnel only. It should be
noted that processor retry is placed in recording mode if it is not in
that state when the SET MODE !AIN RECORD command is issued. While in recording mode, corrected Eee reports are formatted and
recorded on the error recording cylinder, but the primary systems operator is not informed of these incidents.
1-156 IBM VM/370 System Logic and Problem Determination--Volume 1
On processor models equipped with a high-speed buffer (155 II, 158, 165
II, 168, 3031, 3032, 3033) or a data lookaside table (DLAT) (165 II,
168, 3031# 30326 3033) the deletion of buffer blocks because of hardware
failure is reported via a degradation report machine check interruption. MCR enables itself for degradation report machine check interruptions at
system initialization by setting bit 5 of control register 14 to 1. If
a machine check interruption occurs that indicates high-speed buffer or
DLAT damage, MCH formats the record and calls DMKIOEMC to record it on
the error recording cylinder, informs the primary systems operator of
the failure, and returns control to the system to continue normal operation.
The termination subroutine is given control if a hard machine check
interruption occurs while DMKMCH is in the process of handling a machine
check interruption. Note that soft error reporting is disabled for the
entire time that Mea is processing an error.
An analysis is performed of the machine check interruption code of
the first error to determine if it was a soft error. If it was, the
first error is recorded, the system status is restored and control is
restored to the point where the first error occurred. If the first
error was a hard error, the operator communication subroutine is given
control to issue a message directly to the system operator, and to
terminate CP operation. OVERVIEW OF CHANNEL CHECK HANDLER
The channel check handler (CCH) aids the I/O supervisor in recovering from channel errors and informs the operator or service representative
of the occurrence of channel errors.
CCH receives control from the I/O supervisor when a channel data
check, channel control check, or interface control check occurs. CCH
produces an I/O error block (IOERBLOK) for the error recovery program and a record to be written on the error recording cylinder for the system operator or service representative. The operator or service
representative may obtain a copy of the record by using the CMS CPEREP
co.mand. A message about the channel error is issued to the system
operator each time a record is written on the error recording cylinder. When the I/O supervisor program detects a channel error during
routine status examination following an 510, TIO, HIO, or an I/O interruption, it passes control to the channel check handler (DMKCCH). DMKCCB analyzes the channel logout information and constructs an IOERBLOK and, if the error is a channel control or interface control
check, an ECSW is constructed and placed in the IOERBLOK. The IOERBLOK provides information for the device-dependent error recovery procedures. DMKCCH also constructs a record to be recorded on the error recording
cylinder. Normally, DKKCCH returns control to the I/O supervisor after
constructing an IOERBLOK and a record. However, if DftKCCH determines
that system integrity has been damaged (system reset or invalid unit
address, etc.), then CP operation is terminated. CP termination causes DKKCCB to issue a message directly to the system operator and place the
processor in a disabled wait state with a recognizable wait code in the
processor instruction counter.
CP Introduction 1-151
Previous Page Next Page