UFS:  The  Root  of  All  Evil  265  UFS  lives  in  a  strange  world  where  the  computer’s  hard  disk  is  divided  into  three  different  parts:  inodes,  data  blocks,  and  the  free  list.  Inodes  are  point-  ers  blocks  on  the  disk.  They  store  everything  interesting  about  a  file—its  contents,  its  owner,  group,  when  it  was  created,  when  it  was  modified,  when  it  was  last  accessed—everything,  that  is,  except  for  the  file’s  name.  An  oversight?  No,  it’s  a  deliberate  design  decision.  Filenames  are  stored  in  a  special  filetype  called  directories,  which  point  to  inodes.  An  inode  may  reside  in  more  than  one  directory.  Unix  calls  this  a  “hard  link,”  which  is  supposedly  one  of  UFS’s  big  advantages:  the  ability  to  have  a  single  file  appear  in  two  places.  In  practice,  hard  links  are  a  debugging  nightmare.  You  copy  data  into  a  file,  and  all  of  a  sudden—sur-  prise—it  gets  changed,  because  the  file  is  really  hard  linked  with  another  file.  Which  other  file?  There’s  no  simple  way  to  tell.  Some  two-bit  moron  whose  office  is  three  floors  up  is  twiddling  your  bits.  But  you  can’t  find  him.  The  struggle  between  good  and  evil,  yin  and  yang,  plays  itself  out  on  the  disks  of  Unix’s  file  system  because  system  administrators  must  choose  before  the  system  is  running  how  to  divide  the  disk  into  bad  (inode)  space  and  good  (usable  file)  space.  Once  this  decision  is  made,  it  is  set  in  stone.  The  system  cannot  trade  between  good  and  evil  as  it  runs,  but,  as  we  all  know  from  our  own  lives,  too  much  or  too  little  of  either  is  not  much  fun.  In  Unix’s  case  when  the  file  system  runs  out  of  inodes  it  won’t  put  new  files  on  the  disk,  even  if  there  is  plenty  of  room  for  them!  This  happens  all  the  time  when  putting  Unix  File  Systems  onto  floppy  disks.  So  most  people  tend  to  err  on  the  side  of  caution  and  over-allocate  inode  space.  (Of  course,  that  means  that  they  run  out  of  disk  blocks,  but  still  have  plenty  of  inodes  left…)  Unix  manufacturers,  in  their  continued  propaganda  to  convince  us  Unix  is  “simple  to  use,”  simply  make  the  default  inode  space  very  large.  The  result  is  too  much  allocated  inode  space,  which  decreases  the  usable  disk  space,  thereby  increasing  the  cost  per  useful  megabyte.  UFS  maintains  a  free  list  of  doubly-linked  data  blocks  not  currently  under  use.  Unix  needs  this  free  list  because  there  isn’t  enough  online  storage  space  to  track  all  the  blocks  that  are  free  on  the  disk  at  any  instant.  Unfortu-  nately,  it  is  very  expensive  to  keep  the  free  list  consistent:  to  create  a  new  file,  the  kernel  needs  to  find  a  block  B  on  the  free  list,  remove  the  block  from  the  free  list  by  fiddling  with  the  pointers  on  the  blocks  in  front  of  and  behind  B,  and  then  create  a  directory  entry  that  points  to  the  inode  of  the  newly  un-freed  block.  To  ensure  files  are  not  lost  or  corrupted,  the  operations  must  be  performed  atomically  and  in  order,  otherwise  data  can  be  lost  if  the  computer  crashes  
266  The  File  System  while  the  update  is  taking  places.  (Interrupting  these  sorts  of  operations  can  be  like  interrupting  John  McEnroe  during  a  serve:  both  yield  startling  and  unpredictable  results.)  No  matter!  The  people  who  designed  the  Unix  File  System  didn’t  think  that  the  computer  would  crash  very  often.  Rather  than  taking  the  time  to  design  UFS  so  that  it  would  run  fast  and  keep  the  disk  consistent  (it  is  possible  to  do  this),  they  designed  it  simply  to  run  fast.  As  a  result,  the  hard  disk  is  usually  in  an  inconsistent  state.  As  long  as  you  don’t  crash  during  one  of  these  moments,  you’re  fine.  Orderly  Unix  shutdowns  cause  no  problems.  What  about  power  failures  and  glitches?  What  about  goonball  technicians  and  other  incompetent  people  unplugging  the  wrong  server  in  the  machine  room?  What  about  floods  in  the  sewers  of  Chicago?  Well,  you’re  left  with  a  wet  pile  of  noodles  where  your  file  system  used  to  be.  The  tool  that  tries  to  rebuild  your  file  system  from  those  wet  noodles  is  fsck  (pronounced  “F-  sick,”)  the  file  system  consistency  checker.  It  scans  the  entire  file  system  looking  for  damage  that  a  crashing  Unix  typically  exacts  on  its  disk.  Usu-  ally  fsck  can  recover  the  damage.  Sometimes  it  can’t.  (If  you’ve  been  hav-  ing  intermittent  hardware  failures,  SCSI  termination  problems,  and  incomplete  block  transfers,  frequently  it  can’t.)  In  any  event,  fsck  can  take  5,  10,  or  20  minutes  to  find  out.  During  this  time,  Unix  is  literally  holding  your  computer  hostage.  Here’s  a  message  that  was  forwarded  to  UNIX-HATERS  by  MLY  it  orig-  inally  appeared  on  the  Usenet  Newsgroup  comp.arch  in  July  1990:  Date:  13  Jul  9016:58:55  GMT  From:  aglew@oberon.crhc.uiuc.edu  (Andy  Glew)2  Subject:  Fast  Re-booting  Newsgroups:  comp.arch  A  few  years  ago  a  customer  gave  us  a  30  second  boot  after  power  cycle  requirement,  for  a  real-time  OS.  They  wanted  10.  This  DECstation  3100,  with  16MB  of  memory,  and  an  approxi-  mately  300Mb  local  SCSI  disk,  took  8:19  (eight  minutes  and  nine-  teen  seconds)  to  reboot  after  powercycle.  That  included  fsck’ing  the  disk.  Time  measured  from  the  time  I  flicked  the  switch  to  the  time  I  could  log  in.  2Forwarded  to  UNIX-HATERS  by  Richard  Mlynarik.  
 
             
            






































































































































































































































































































































































