If the space allocated for and spooling on the systea's DASD volumes is exhausted and more 1S requested by a virtual spooling
function, the user receives a aessage and a unit check intervention
required condition is reflected to the virtual output device that is
requesting the space, the output file is automatically closed and it is
available for future processing. The user can clear the unit check and
periodically retry the operation which will start when space is free or completely restart later from the beginning of the job. If the task
requesting the space is the real spooling reader task, the operator
receives an error message and the partially coaplete file is purged. Any time the spooling space is exhausted, the operator is warned by a
console message and However, the system atteapts to continue
normal operation. RECOVERY FROM SYSTEM FAILURE Should the system suffer an abnormal termination, CP attempts to perfora
a warm start. Spool file and device data, as veIl as other syste.
information is copied from real storage to warm start cylinders on DASD storage. When the system is reinitialized, the spool data and other
system data is retrieved from the warm start cylinders and operation
continues.
If the warm start data in real storage was damaged by the abnor.al
termination, the warm start procedure recognizes the situation and
notifies the operator that a warm start cannot be perfor.ed. Another
recovery method would be to attempt a checkpoint start.
The spool file recovery routines (DMKCKS) dynamically checkpoint on DASD storage; the status of all open reader files, the status of all
closed output files, real spooling device data, and system hold queue
information. This information is stored on checkpoint cylinders that
are allocated, along with warm start cylinders, at system generation. When a checkpoint (CKPT) start is spool file and spooling
device information is retrieved from the checkpoint cylinders. Spool file blocks are chained to their appropriate reader, printer or punch
chains; record allocation blocks are reconstructed; spooling device
status is restored; and, system hold queues are chained to the proper
devices. System operation then continues.
If the checkpoint start procedure encounters I/O errors or invalid DASD data on the checkpoint cylinders, the operator is notified. The
FORCE option of the checkpoint start performs all the checkpoint start
functions except that, invalid or unreadable files are bypassed. While this is at best a partial recovery, the only other alternative is a cold
(COLD) start, where all spool file data is lost. RECOVERY MANAGEMENT SUPPORT (RMS) The machine check handler (MCR) minimizes lost computing tiae caused by machine malfunction. MCR does this by atteapting to correct the
malfunction imaediately, and by producing machine check records and
messages to assist the service representatives in determining the cause
of the problem. CP Introduction 1-149
The channel check handler (CCH) aids the I/O supervisor (DMKIOS) to
recover from channel errors. CCH provides the device-dependent error
recovery programs (ERPs) with the information needed to retry a channel
operation that has failed.
This support is standard and model-independent on the external level (from the user's point of view there are no considerations, at system generation time, for model dependencies) • SYSTEM INITIALIZATION FOR RMS DMKCPI calls DMKIOEFL to initialize the error recording at cold start
and warm start. DMKIOEFL gives control to DMKIOG to initialize the MCB area. A store CPU ID (STIDP) instruction is performed to determine if VM/370 is running in a virtual machine environment, or running
standalone on the real machine. If VM/370 is running in a virtual
machine, the version code is set to X'FF' by DMKPRV. If the version
code returned is X'FF', the RMS functions are not initialized beyond
setting the wait bit on in the machine check new PSi (virtual). This occurs because machine check interruptions are not reflected to any
virtual machine. VM/370, running on the real machine, determines
whether the virtual machine should be terminated.
If the version code is not X'FF', DMKIOG determines what channels are
online by performing a Store Channel ID (STIDC) instruction and saves
the channel type for each channel that is online. The maximum machine check extended logout length (MCEL) indicated by the Store CPU ID (STIDP) instruction is added to the length of the MCH record header,
fixed logout length and damage assessment data field. DMKIOG then calls DMKFRE to obtain the necessary storage to be allocated for the MCB record area (MCRECORD), the CP execution block (CPEXBLOK), MCHAREA, and MCEL. The address of MCHAREA is put in the PSI (ABCBAREA). Pointers to MCRECORD and the CPEXBLOK and put in MCHAREA. DMKIOG puts the address of !CEL in control register 15. DMKIOG obtains the storage for the I/O extended logout area and initializes the logout area and the ECSi to
ones. The I/O extended logout pointer is saved at location 172 and
control register 15 is initialized with the address of the extended
logout area. The length of the CCB record and the online channel types
are saved in DMKCCH. It should be noted that the ability of a CPU to
produce an extended logout or I/O extended logout and the length of the
logouts are both model- and channel-dependent. If VM/370 is· being
initialized on a Model 165 II or 168, the 2860, 2870, and 2880 standalone channel modules are loaded and locked by the paging
supervisor and the pointers are saved in DMKCCB. If VM/370 is being
initialized on any other model, the integrated channel support is
assumed; this support is part of the channel control subroutine of DMKCCH. Before returning to DMKIOE, the VM/370 error recording
cylinders are initialized. DMKIOE passes control back to DMKCPI and
control register 14 is initialized with the proper mask to record
machine checks. OVERVIEW OF MACBINE CHECK HANDLER
A machine malfunction can originate from the processor, real storage or
control storage. When any of these fails to work properly, the processor
attempts to correct the malfunction. When the malfunction is corrected, the machine check handler (MCB) is
notified by a machine check interruption and the processor logs out
fields of information in real storage, detailing the cause and nature of
the error. The model-independent data is stored in the fixed logout area 1-150 IBM VM/370 System Logic and Problem Determination--Voluae 1
Previous Page Next Page