SPOOL FILE ERROR RECOVERY I/O errors on real spooling unit record devices are handled by a
transient routine that is called by D"KIOS after it has sensed the unit
check associated with the error on a spooling device. If appropriate, a
restart CAW is calculated and D"KIOS is requested to retry the
operation, in some cases waiting for a device end that signals that the
failing device has been made ready after manual corrective measures have
been taken. If, after retrying the operation, the error is
unrecoverable, DKKIOS is informed that a fatal error has occurred. D"KIOS then unstacks the interruption, flagged as a fatal error, and
passes control to real spooling executive. The routines that handle
unstacked interruptions in real spooling execute only module operations
that have been completed correctly or those that are fatal errors. If a
fatal error is unstacked, the recovery mechanism depends on the
operation in progress.
For fatal reader errors, processing of the current file is terllinated and any portion of the file that has been read and stored on disk is
purged. The owner of the file is not informed of the presence of a
fractional part of the file in the system.
For fatal printer or punch errors, the SFBLOK for the partially completed file is re-queued to the appropriate output list and
processing can be resumed by another available printer or punch, or can
be deferred until the failing device is repaired.
In any case, the failing device is marked logically offline, and no attempt is made by the system to use it until the operator varies it
back online via the VARY command. If an invalid load module is specified for a 3800 DIAGNOSE code 1'14'), the file involved is held or
printer queue is searched for the next file to print.
user and operator are sent a message (DftKRSE241E), action.
printer (refer to
purged, and the
In addition, the
describing the DASD I/O errors for page writes are transparent to the user. A new page
for the buffer is assigned, the file linkage pointers are adjusted, and
the buffer is rewritten. The failing page is not de-allocated and no
subsequent request for page space is granted access to the failing page.
If an unrecoverable error is encountered while reading a page,
processing depends on the routine that is reading the file. If the
processing is being done for a virtual reader, the. user is informed of
the error and a unit check/intervention required condition is reflected
to the reader. If the processing is being done for a real printer or
punch, the failing buffer is put into the system hold status, and
processing continues with the next file. In either case, the DASD page
is not de-allocated and it is not available for the use of other tasks.
1-148 IBK V"/310 System Logic and Problem Deteraination--Voluae 1
If the space allocated for and spooling on the systea's DASD volumes is exhausted and more 1S requested by a virtual spooling
function, the user receives a aessage and a unit check intervention
required condition is reflected to the virtual output device that is
requesting the space, the output file is automatically closed and it is
available for future processing. The user can clear the unit check and
periodically retry the operation which will start when space is free or completely restart later from the beginning of the job. If the task
requesting the space is the real spooling reader task, the operator
receives an error message and the partially coaplete file is purged. Any time the spooling space is exhausted, the operator is warned by a
console message and However, the system atteapts to continue
normal operation. RECOVERY FROM SYSTEM FAILURE Should the system suffer an abnormal termination, CP attempts to perfora
a warm start. Spool file and device data, as veIl as other syste.
information is copied from real storage to warm start cylinders on DASD storage. When the system is reinitialized, the spool data and other
system data is retrieved from the warm start cylinders and operation
continues.
If the warm start data in real storage was damaged by the abnor.al
termination, the warm start procedure recognizes the situation and
notifies the operator that a warm start cannot be perfor.ed. Another
recovery method would be to attempt a checkpoint start.
The spool file recovery routines (DMKCKS) dynamically checkpoint on DASD storage; the status of all open reader files, the status of all
closed output files, real spooling device data, and system hold queue
information. This information is stored on checkpoint cylinders that
are allocated, along with warm start cylinders, at system generation. When a checkpoint (CKPT) start is spool file and spooling
device information is retrieved from the checkpoint cylinders. Spool file blocks are chained to their appropriate reader, printer or punch
chains; record allocation blocks are reconstructed; spooling device
status is restored; and, system hold queues are chained to the proper
devices. System operation then continues.
If the checkpoint start procedure encounters I/O errors or invalid DASD data on the checkpoint cylinders, the operator is notified. The
FORCE option of the checkpoint start performs all the checkpoint start
functions except that, invalid or unreadable files are bypassed. While this is at best a partial recovery, the only other alternative is a cold
(COLD) start, where all spool file data is lost. RECOVERY MANAGEMENT SUPPORT (RMS) The machine check handler (MCR) minimizes lost computing tiae caused by machine malfunction. MCR does this by atteapting to correct the
malfunction imaediately, and by producing machine check records and
messages to assist the service representatives in determining the cause
of the problem. CP Introduction 1-149
Previous Page Next Page