On processor models equipped with a high-speed buffer (155 II, 158, 165
II, 168, 3031, 3032, 3033) or a data lookaside table (DLAT) (165 II,
168, 3031# 30326 3033) the deletion of buffer blocks because of hardware
failure is reported via a degradation report machine check interruption. MCR enables itself for degradation report machine check interruptions at
system initialization by setting bit 5 of control register 14 to 1. If
a machine check interruption occurs that indicates high-speed buffer or
DLAT damage, MCH formats the record and calls DMKIOEMC to record it on
the error recording cylinder, informs the primary systems operator of
the failure, and returns control to the system to continue normal operation.
The termination subroutine is given control if a hard machine check
interruption occurs while DMKMCH is in the process of handling a machine
check interruption. Note that soft error reporting is disabled for the
entire time that Mea is processing an error.
An analysis is performed of the machine check interruption code of
the first error to determine if it was a soft error. If it was, the
first error is recorded, the system status is restored and control is
restored to the point where the first error occurred. If the first
error was a hard error, the operator communication subroutine is given
control to issue a message directly to the system operator, and to
terminate CP operation. OVERVIEW OF CHANNEL CHECK HANDLER
The channel check handler (CCH) aids the I/O supervisor in recovering from channel errors and informs the operator or service representative
of the occurrence of channel errors.
CCH receives control from the I/O supervisor when a channel data
check, channel control check, or interface control check occurs. CCH
produces an I/O error block (IOERBLOK) for the error recovery program and a record to be written on the error recording cylinder for the system operator or service representative. The operator or service
representative may obtain a copy of the record by using the CMS CPEREP
co.mand. A message about the channel error is issued to the system
operator each time a record is written on the error recording cylinder. When the I/O supervisor program detects a channel error during
routine status examination following an 510, TIO, HIO, or an I/O interruption, it passes control to the channel check handler (DMKCCH). DMKCCB analyzes the channel logout information and constructs an IOERBLOK and, if the error is a channel control or interface control
check, an ECSW is constructed and placed in the IOERBLOK. The IOERBLOK provides information for the device-dependent error recovery procedures. DMKCCH also constructs a record to be recorded on the error recording
cylinder. Normally, DKKCCH returns control to the I/O supervisor after
constructing an IOERBLOK and a record. However, if DftKCCH determines
that system integrity has been damaged (system reset or invalid unit
address, etc.), then CP operation is terminated. CP termination causes DKKCCB to issue a message directly to the system operator and place the
processor in a disabled wait state with a recognizable wait code in the
processor instruction counter.
CP Introduction 1-151
Normally, when DMKCCH returns control to the I/O supervisor, the
error recovery program for the device which experienced the error is
scheduled. When the ERP receives control, it prepares to retry the
operation if analysis of the IOERBLOK indicates that retry is possible.
Depending on the device type and error condition, the ERP either effects
recovery or marks the event fatal and returns control to the I/O supervisor. The I/O supervisor calls the recording routine DftKIOE to
record the channel error.
The primary system operator is notified of the failure, and DftKIOE returns control to the system and normal processing continues.
If the channel check is associated with an I/O event initiated by a SIO in a virtual machine, the logout is reflected to the virtual machine in one of two ways, depending upon whether the channel check occurred at SIO time or later in an interrupt. If it occurred at SIO ti.e, then DMKYSI (or occasionally DMKYIO) calls upon DftKCCHRF to reflect the
logout. If it occurred in an I/O interrupt, the dispatcher notices the
channel check as it is reflecting the I/O interrupt to the virtual
machine, and so, at that ti.e, DMKDSP calls upon DftKCCHRF to reflect the
logout.
CHANNEL CONTROL SUBROUTINE Control is passed to the channel control subroutine of DftKCCH after a SIO with failing status stored, or an I/O interrupt because of a channel
control check, interface control check, or channel data check.
If "logout pending" is indicated in the CSW, the CP teraination flag
is set. The existence of real device blocks (RCHBLOK, RCUBLOK, RDEYBLOK), for the failing device address, is determined by a call to DMKSCNRU and an indicator is set if they do exist. An indicator is also
set if the IOBLOK for the failing device address exists. A call to DMKFREE obtains storage space for the channel check record and the
channel control subroutine builds the record. If the indicators show
that the real device blocks and the IOBLOK exist, a call to DftKFREE obtains storage space and the channel control subroutine builds the I/O error block (IOERBLOK); if these blocks do not exist, the IOERBLOK is
not built. The IOERBLOK is used for two purposes:
1. The device-dependent error recording program (ERP) uses the IOERBLOK to attempt recovery on CP-initiated I/O events. If the I/O events that resulted in a channel check are associated with a
virtual machine, the I/O fatal flag is set in the IOBLOK and the
virtual machine is reset, cleared, and put into CP read status.
The length and address of the channel check record is placed in the IOERBLOK and the IOERBLOK is chained off the IOBLOK. 2. DMKIOECC uses the IOERBLOK to record the channel check record on
the error recording cylinder.
The channel control subroutine gives control to a channel-dependent
error analysis routine to build or save the extended channel status word
(ECSW). When the channel control subroutine regains control, eight active addresses are saved in the channel check record.
If the CP termination flag is set, the I/O extended logout data fro.
the channel check record is restored to main storage for use by SEREP.
If the system operator is both logged on as a user and connected to the system, a message (DMKCCH603W) is sent to him advising hi. of the
channel error. A LPSW is then executed to place the processor in a
disabled wait state with a wait state code of 002 in the processor
instruction counter.
1-158 IBM VM/370 System Logic and Problem Deteraination--Volume 1
Previous Page Next Page