156 csh, pipes, and find Hmm, perhaps not so robust after all. Unix file names can have any character in them (except slash). A space in a filename will break this script, since the shell will parse it as two file names. Well, that’s not too hard to deal with. We'll just change the IFS to not include Space (or Tab while we're at it), and carefully quote (not too little, not too much!) our variables, like this: IFS=' ' for file in `ls -f` do if [ "$file" != . -a "$file" != .. ] then file "$file" fi done Some of you alert people will have already noticed that we have made the problem smaller, but we haven't eliminated it, because Linefeed is also a legal character in a filename, and it is still in IFS. Our script has lost some of its simplicity, so it is time to reevaluate our approach. If we removed the “ls” then we wouldn’t have to worry about parsing its output. What about for file in .* * do if [ "$file" != . -a "$file" != .. ] then file "$file" fi done Looks good. Handles dot files and files with nonprinting characters. We keep adding more strangely named files to our test directory, and this script continues to work. But then someone tries it on an empty directory, and the * pattern produces “No such file.” But we can add a check for that… …at this point my message is probably getting too long for some of your uucp mailers, so I'm afraid I'll have to close here and leave fix- ing the remaining bugs as an exercise for the reader. Stephen
Shell Programming 157 There is another big problem as well, one that we’ve been glossing over from the beginning. The Unix file program doesn’t work. Date: Sat, 25 Apr 92 17:33:12 EDT From: Alan Bawden Alan@lcs.mit.edu Subject: Simple Shell Programming To: UNIX-HATERS WHOA! Hold on a second. Back up. You're actually proposing to use the ‘file’ program? Everybody who wants a good laugh should pause right now, find a Unix machine, and try typing “file *” in a directory full of miscellaneous files. For example, I just ran ‘file’ over a directory full of C source code— here is a selection of the results: arith.c: c program text binshow.c: c program text bintxt.c: c program text So far, so good. But then: crc.c: ascii text See, ‘file’ isn’t looking at the “.c” in the filename, it’s applying some heuristics based on an examination of the contents of the file. Appar- ently crc.c didn’t look enough like C code—although to me it couldn’t possibly be anything else. gencrc.c.~4~: ascii text gencrc.c: c program text I guess I changed something after version 4 that made gencrc.c look more like C… tcfs.h.~1~: c program text tcfs.h: ascii text while tcfs.h looked less like C after version 1. time.h: English text That’s right, time.h apparently looks like English, rather than just ascii. I wonder if ‘file’ has recognition rules for Spanish or French?