240 System Administration Where Did I Go Wrong? Date: Thu, 20 Dec 90 18:45 CST From: Chris Garrigues 7thSon@slcs.slb.com To: UNIX-HATERS Subject: Support of Unix machines I was thinking the other day about how my life has changed since Lisp Machines were declared undesirable around here. Until two years ago, I was single-handedly supporting about 30 LispMs. I was doing both hardware and software support. I had time to hack for myself. I always got the daily paper read before I left in the afternoon, and often before lunch. I took long lunches and rarely stayed much after 5pm. I never stayed after 6pm. During that year and a half, I worked one (1) weekend. When I arrived, I thought the environment was a mess, so I put in that single weekend to fix the namespace (which lost things mysteriously) and moved things around. I reported bugs to Symbolics and when I wasn’t ignored, the fixes eventually got merged into the system. Then things changed. Now I’m one of four people supporting about 50 Suns. We get hardware support from Sun, so we’re only doing software. I also take care of our few remaining LispMs and our Cisco gateways, but they don’t require much care. We have an Auspex, but that’s just a Sun which was designed to be a server. I work late all the time. I work lots of weekends. I even sacrificed my entire Thanksgiv- ing weekend. Two years later, we’re still cleaning up the mess in the environment and it’s full of things that we don’t understand at all. There are multiple copies of identical data which we’ve been unable to merge (mostly lists of the hosts at our site). Buying the Auspex brought us from multiple single points of failure to one huge single point of failure. It’s better, but it seems that in my past, people fre- quently didn’t know that a server was down until it came back up. Even with this, when the mail server is down, “pwd” still fails and nobody, including root, can log in. Running multiple version of any software from the OS down is awkward at best, impossible at worst. New OS versions cause things to break due to shared libraries. I report bugs to Sun and when I’m not ignored, I’m told that that’s the way it’s supposed to work. Where did I go wrong?
Where Did I Go Wrong? 241