296 NFS We found the segment of code that was causing the problem. When Emacs tries to open a file to edit, it tries to do an exclusive create on the superlock file. If the exclusive create fails, it tries 19 more times with a one second delay between each try. After 20 tries it just ignores the lock file being there and opens the file the user wanted. If it succeeds in creating the lock file, it opens the user’s file and then immediately removes the lock file. The problem we had was that /usr/lib/emacs/lock was mounted over NFS, and apparently NFS doesn’t handle exclusive create as well as one would hope. The command would create the file, but return an error saying it didn’t. Since Emacs thinks it wasn't able to create the lock file, it never removes it. But since it did create the file, all future attempts to open files encounter this lock file and force Emacs to go through a 20-second loop before proceeding. That was what was causing the delay. The hack we used to cure this problem was to make /usr/lib/emacs/lock be a symbolic link to /tmp, so that it would always point to a local directory and avoid the NFS exclusive create bug. I know this is far from perfect, but so far it is working correctly. Thanks to everyone who responded to my plea for help. It’s nice to know that there are so many friendly people on the net. The freezing is exacerbated by any program that needs to obtain the name of the current directory. Unix still provides no simple mechanism for a process to discover its “cur- rent directory.” If you have a current directory, “.”, the only way to find out its name is to open the contained directory “. .”—which is really the parent directory—and then to search for a directory in that directory that has the same inode number as the current directory, “.”. That’s the name of your directory. (Notice that this process fails with directories that are the target of symbolic links.) Fortunately, this process is all automated for you by a function called getcwd(). Unfortunately, programs that use getcwd() unexpectedly freeze. Carl R. Manning at the MIT AI Lab got bitten by this bug in late 1990.
Not File System Specific? (Not Quite) 297 Date: Wed, 12 Dec 90 15:07 EST From: Jerry Roylance glr@ai.mit.edu Subject: Emacs needs all file servers? (was: AB going down) To: CarlManning@ai.mit.edu5 Cc: SYSTEM-HACKERS@ai.mit.edu, SUN-FORUM@ai.mit.edu Date: Wed, 12 Dec 90 14:16 EST From: Carl R. Manning CarlManning@ai.mit.edu Out of curiosity, is there a good reason why Emacs can’t start up (e.g., on rice-chex) when any of the file servers are down? E.g., when AB or WH have been down recently for disk problems, I couldn’t start up an Emacs on RC, despite the fact that I had no intention of touching any files on AB or WH. Sun brain damage. Emacs calls getcwd, and getcwd wanders down the mounted file systems in /etc/mtab. If any of those file systems is not responding, Emacs waits for the timeout. An out-to-lunch file system would be common on public machines such as RC. (Booting RC would fix the problem.) Booting rice-chex would fix the problem. How nice! Hope you aren’t doing anything else important on the machine. Not Supporting Multiple Architectures Unix was designed in a homogeneous world. Unfortunately, maintaining a heterogeneous world (even with hosts all from the same vendor) requires amazingly complex mount tables and file system structures, and even so, some directories (such as /usr/etc) contain a mix of architecture-specific and architecture-dependent files. Unlike other network file systems (such as the Andrew File System), NFS makes no provisions for the fact that dif- ferent kinds of clients might need to “see” different files in the same place of their file systems. Unlike other operating systems (such as Mach), Unix makes no provision for stuffing multiple architecture-specific object mod- ules into a single file. You can see what sort of problems breed as a result: 5Forwarded to UNIX-HATERS by Steve Robbins.