Apple Computer’s Mail Disaster of 1991 85 For example, before sendmail will accept a message (by returning exit status or sending a response code) it insures that all information needed to deliver that message is forced out to the disk. In this way, sendmail has “accepted responsibility” for delivery of the message (or notification of failure). If the message is lost prior to acceptance, it is the “fault” of the sender if lost after acceptance, it is the “fault” of the receiving sendmail. This algorithm implies that a window exists where both sender and receiver believe that they are “responsible” for this message. If a failure occurs during this window then two copies of the message will be delivered. This is normally not a catastrophic event, and is far superior to losing a message. This design choice to deliver two copies of a message rather than none at all might indeed be far superior in most circumstances. Certainly, lost mail is a bad thing. On the other hand, techniques for guaranteeing synchronous, atomic operations, even for processes running on two separate computers, were known and understood in 1983 when sendmail was written. Date: Thu, 09 May 91 23:26:50 -0700 From: “Erik E. Fair”6 (Your Friendly Postmaster) fair@apple.com To: tcp-ip@nic.ddn.mil, unicode@sun.com, [...] Subject: Case of the Replicated Errors: An Internet Postmaster’s Horror Story This Is The Network: The Apple Engineering Network. The Apple Engineering Network has about 100 IP subnets, 224 AppleTalk zones, and over 600 AppleTalk networks. It stretches from Tokyo, Japan, to Paris, France, with half a dozen locations in the U.S., and 40 buildings in the Silicon Valley. It is interconnected with the Internet in three places: two in the Silicon Valley, and one in Boston. It supports almost 10,000 users every day. When things go wrong with e-mail on this network, it’s my problem. My name is Fair. I carry a badge. 6Erik Fair graciously gave us permission to reprint this message which appeared on the TCP-IP, UNICODE, and RISKS mailing lists, although he added: “I am not on the UNIX-HATERS mailing list. I have never sent anything there personally. I do not hate Unix I just hate USL, Sun, HP, and all the other vendors who have made Unix FUBAR.”
86 Mail [insert theme from Dragnet] The story you are about to read is true. The names have not been changed so as to finger the guilty. It was early evening, on a Monday. I was working the swing shift out of Engineering Computer Operations under the command of Richard Herndon. I don’t have a partner. While I was reading my e-mail that evening, I noticed that the load average on apple.com, our VAX-8650, had climbed way out of its normal range to just over 72. Upon investigation, I found that thousands of Internet hosts7 were trying to send us an error message. I also found 2,000+ copies of this error message already in our queue. I immediately shut down the sendmail daemon which was offering SMTP service on our VAX. I examined the error message, and reconstructed the following sequence of events: We have a large community of users who use QuickMail, a popular Macintosh based e-mail system from CE Software. In order to make it possible for these users to communicate with other users who have chosen to use other e-mail systems, ECO supports a QuickMail to Internet e-mail gateway. We use RFC822 Internet mail format, and RFC821 SMTP as our common intermediate r-mail standard, and we gateway everything that we can to that standard, to promote interop- erability. The gateway that we installed for this purpose is MAIL*LINK SMTP from Starnine Systems. This product is also known as GatorMail-Q from Cayman Systems. It does gateway duty for all of the 3,500 QuickMail users on the Apple Engineering Network. Many of our users subscribe, from QuickMail, to Internet mailing lists which are delivered to them through this gateway. One such 7Erik identifies these machines simply as “Internet hosts,” but you can bet your cookies that most of them were running Unix.