Finding files

Here's a one-line story: a client told me he had just installed a program, but couldn't find its configuration files.

POSIX-compliant systems come with two utilities that at first sight do the same thing, find and locate. Their semantical similitude will trip over most new shell users at one point or another. When their difference is not well understood, the tendency is to pick one on a whim, and then be startled at the results.

What tool was my client using to locate his files? Right, locate. Why couldn't locate find them? Well, locate is a utility that queries a lookup table that gets updated periodically. On a Mac, the interval is set to once a week. No wonder it can't find recent files! If you insist, you can force locate to update its database (on a Mac you would run /usr/libexec/locate.updatedb). But guess what? locate still won't find my client's files. The reason for this is that locate only indexes world-readable files, and it just so happens that my client's configuration files were installed in a user-only readable directory, as they should be.

Why can locate access only a restricted set of files? Because locate runs under the unprivileged account of special user nobody, to which only files with permissions set to user nobody, group nobody, or world is visible. Is locate useful at all then? Sure, it is fast, as in super fast—an optimized table lookup. What about the restrictions related to file permissions? Well, remember that on most POSIX systems files are world-readable by default (yes, your home directory too, and if you're on a multi-user system, you might want to change that right now with chmod -R 750 $HOME). So if it's true that locate won't find all of your files on your system, it will find most of them.

Now, contrast locate to find, a utility that recursively descends a directory tree with powerful pattern matching capabilities. Its operation is expensive and will take some time to complete, but your file will be found no matter what. To find a file by name that can be anywhere on the filesystem, you would type find / -name 'filename'.

With this understanding, you know now how to proceed when looking for files. Start off with a lightning speed locate, and if it fails, don't panic, just switch to a find command. And if you're on a Mac and you find yourself doing this often, you might want to update your precomputed database more than once a week, say every day.

In the terminal, type:

open /System/Library/LaunchDaemons/com.apple.locate.plist

Find the StartCalendarInterval section and delete the Weekday key:

<key>StartCalendarInterval</key>
     <dict>
          <key>Hour</key>
          <integer>3</integer>
          <key>Minute</key>
          <integer>15</integer>
     </dict>

Afraid of breaking something? Fear not: validate your changes with:

plutil /System/Library/LaunchDaemons/com.apple.locate.plist

The output confirms your plist is valid:

/System/Library/LaunchDaemons/com.apple.locate.plist: OK

Now let’s embark on a short UNIX archeological journey replete with geeky miscellanea. The first thing to note is that find is actually what locate uses internally when building its database on schedule. If you suspect some promiscuity between the two, you'd be right on spot: locate was born out of find. It was initially called fastfind, and it was a refinement that James A. Woods introduced in the standard find shipping with Berkeley UNIX (BSD). His ideas are laid out in a paper called Finding Files Fast. Interestingly, he thought there was no need to introduce a new tool (and a new man page), arguing it would be wiser to leverage the existing find, supercharging it with a fast mode. If you ran find with two arguments, it would run as before, traversing the filesystem, but if you gave it only one argument (the search string), it would query a precomputed database, dramatically speeding up the operation. Note also that in his mind, the database would get updated on a daily basis. For the hardcore UNIX archeologists among you, the source code can be perused online, with the fastfind section starting on line 797.

The result of this change was that the signature of the find command became ambiguous and broke POSIX requirements. Suddenly, find foo was not equivalent anymore to find foo -print. In the first form, it would query the database, in the second form, it would not. So in a later version of Berkeley UNIX (BSD), the findfast part of find was extracted and made into a separate utility: locate. The GNU distribution did likewise in the package (findutils) that encloses find, xargs, and locate.

The UNIX-HATERS handbook correctly points out that find wasn’t designed to work with humans, but as a companion utility to cpio, a Unix backup utility program. Both utilities were written by Dick Haight while working in AT&T's Unix Support Group, appearing in 1977 in PWB/UNIX 1.0, the “Programmer's Work Bench” system for use within AT&T. It was first released outside of AT&T as part of System III Unix in 1981. In the early days, its peculiarities aggravated some UNIX users, complaining about the seemingly useless -print option, or lamenting the fact that it didn't support symlinks (for a certain period), etc. Today, most issues with find have been addressed, but reading its man page is still a panacea for none but die-hard UNIX hackers. For the rest of us, a web interface can come handy in those instances when we want to leverage find's full power. Most of the time though, all that is needed to remember is:

find <directory> -name <filename>