Financial Opportunity

Free Software * Books

Source Code

Linux

Media

Linux Find Command

The find program searches for files that match the specified criteria and optionally executes some function on those files. Find can compare or identify files based on name, location, type, size, creation and many more attributes. The find manual page documents the complete list of options, operators, actions and setting available for the find command. View the info pages for find for additional information and examples.

This page documents common tasks that utilize find and provides examples that use find to solve real problems.

Examples

Viewing And Editing (find -xdev)
Archiving (find -print -depth)
Cleaning Up (find -prune -name -print)
Strange File Names (find -okdir -maxdepth -inum)
Fixing Permissions (find -type -perm)
Classifying Files (find -perm)
Deleting Files (find -mtime -exec)
Making Use of `xargs' (find xargs)
Unusual characters in filenames
Using `-exec' (find -exec)
A more secure version of `-exec' (find -exec)
Using the `-delete' action
Copying A Subset of Files (find -name -prune -o)
Updating A Timestamp File (find -newer -exec)
Using `-printf' and `sort' to compare timestamps
Coping with odd filenames

Viewing And Editing (find -xdev)

Below is an example using find to locate and view all files that end in .c that are under the current directory hierarchy. In this example, the viewing command less is utilized to view the list of file names output by the find command. The find command is enclosed in backquotes so that the shell will replace the enclosed programs(s) with their output.

less `find . -name '*.c'`

To restrict the program to only finding files on the current filesystem, use find -xdev to prevents the command from decending to directories on other file systems.

less `find . -name '*.c' -xdev`

To further refine the filtering process, the output of find could have been piped to xargs and filtered by grep as shown below:

less `find . -name '*.c'| xargs grep -l llo `

For the command above, only the files ending in .c and containing the string "llo" would be displayed.

It should be noted that the command line has a maximum length. Therefore, if there is a need to process a large number of files or items an alternate method is needed. The example below illustrates a method of using xargs to direct input to an output file and then execute a command for each element in the output file:

find /usr/include -name '*.h' | xargs grep -l mode_t > todo
xargs --arg-file=todo emacs

In the example above `xargs' will run `emacs' once for each file listed in the file `todo'.

The next example illustrates how to find and edit the list of files with a single invocation:

find /usr/include -name '*.h' | xargs grep -l mode_t | xargs sh -c 'emacs "$@" < /dev/tty' Emacs

In this example 'xargs' invokes a shell command using `sh -c'. The `$@' in the command line is expanded by the shell to a list of arguments as provided by `xargs'. The single quotes in the command line protect the `$@' against expansion by your interactive shell (which will normally have no arguments and thus expand `$@' to nothing). The capitalised `Emacs' on the command line is used as `$0' by the shell that `xargs' launches.

Archiving (find -print -depth)

You can pass a list of files produced by `find' to a file archiving program. GNU `tar' and `cpio' can both read lists of file names from the standard input - either delimited by nulls (the safe way) or by blanks (the lazy, risky default way). To use null-delimited names, give them the `--null' option. You can store a file archive in a file, write it on a tape, or send it over a network to extract on another machine.

One common use of `find' to archive files is to send a list of the files in a directory tree to `cpio'. Use `-depth' so if a directory does not have write permission for its owner, its contents can still be restored from the archive since the directory's permissions are restored after its contents. Here is an example of doing this using `cpio'; you could use a more complex `find' expression to archive only certain files.

find . -depth -print0 | cpio --create --null --format=crc --file=/dev/nrst0

You could restore that archive using this command:

cpio --extract --null --make-dir --unconditional \ --preserve --file=/dev/nrst0

Here are the commands to do the same things using `tar':

find . -depth -print0 | tar --create --null --files-from=- --file=/dev/nrst0
tar --extract --null --preserve-perm --same-owner \ --file=/dev/nrst0

Here is an example of copying a directory from one machine to another:

find . -depth -print0 | cpio -0o -Hnewc | rsh OTHER-MACHINE "cd `pwd` && cpio -i0dum"

rsh OTHER-MACHINE "cd `pwd` && cpio -i0dum"

Cleaning Up (find -prune -name -print)

This section gives examples of removing unwanted files in various situations. Here is a command to remove the CVS backup files created when an update requires a merge:

find . -name '.#*' -print0 | xargs -0r rm -f

If your `find' command removes directories, you may find that you get a spurious error message when `find' tries to recurse into a directory that has now been removed. Using the `-depth' option will normally resolve this problem.

It is also possible to use the `-delete' action:

find . -depth -name '.#*' -delete

You can run this command to clean out your clutter in `/tmp'. You might place it in the file your shell runs when you log out (`.bash_logout', `.logout', or `.zlogout', depending on which shell you use).

find /tmp -depth -user "$LOGNAME" -type f -delete

To remove old Emacs backup and auto-save files, you can use a command like the following. It is especially important in this case to use null-terminated file names because Emacs packages like the VM mailer often create temporary file names with spaces in them, like `#reply to David J. MacKenzie<1>#'.

find ~ $ -name '*~' -o -name '#*#' $ -print0 | xargs --no-run-if-empty --null rm -vf

Removing old files from `/tmp' is commonly done from `cron':

find /tmp /var/tmp -depth -not -type d -mtime +3 -delete find /tmp /var/tmp -depth -mindepth 1 -type d -empty -delete

The second `find' command above cleans out empty directories depth-first (`-delete' implies `-depth' anyway), hoping that the parents become empty and can be removed too. It uses `-mindepth' to avoid removing `/tmp' itself if it becomes totally empty. arents become empty and can be removed too. It uses `-mindepth' to avoid removing `/tmp' itself if it becomes totally empty.

Lastly, an example of a program that almost certainly does not do what the user intended:

find dirname -delete -name quux

If the user hoped to delete only files named `quux' they will get an unpleasant surprise; this command will attempt to delete everything at or below the starting point `dirname'. This is because `find' evaluates the items on the command line as an expression. The `find' program will normally execute an action if the preceding action succeeds. Here, there is no action or test before the `-delete' so it will always be executed. The `-name quux' test will be performed for files we successfully deleted, but that test has no effect since `-delete' also disables the default `-print' operation. So the above example will probably delete a lot of files the user didn't want to delete.

This command is also likely to do something you did not intend:

find dirname -path dirname/foo -prune -o -delete

Because `-delete' turns on `-depth', the `-prune' action has no effect and files in `dirname/foo' will be deleted too.

Strange File Names (find -okdir -maxdepth -inum)

`find' can help you remove or rename a file with strange characters in its name. People are sometimes stymied by files whose names contain characters such as spaces, tabs, control characters, or characters with the high bit set. The simplest way to remove such files is:

rm -i SOME*PATTERN*THAT*MATCHES*THE*PROBLEM*FILE

`rm' asks you whether to remove each file matching the given pattern. If you are using an old shell, this approach might not work if the file name contains a character with the high bit set; the shell may strip it off. A more reliable way is:

find . -maxdepth 1 TESTS -okdir rm '{}' \;

where TESTS uniquely identify the file. The `-maxdepth 1' option prevents `find' from wasting time searching for the file in any subdirectories; if there are no subdirectories, you may omit it. A good way to uniquely identify the problem file is to figure out its inode number; use

ls -i

Suppose you have a file whose name contains control characters, and you have found that its inode number is 12345. This command prompts you for whether to remove it:

find . -maxdepth 1 -inum 12345 -okdir rm -f '{}' \;

If you don't want to be asked, perhaps because the file name may contain a strange character sequence that will mess up your screen when printed, then use `-execdir' instead of `-okdir'.

If you want to rename the file instead, you can use `mv' instead of `rm':

find . -maxdepth 1 -inum 12345 -okdir mv '{}' NEW-FILE-NAME \;

Fixing Permissions (find -type -perm)

Suppose you want to make sure that everyone can write to the directories in a certain directory tree. Here is a way to find directories lacking either user or group write permission (or both), and fix their permissions:

find . -type d -not -perm -ug=w | xargs chmod ug+w

You could also reverse the operations, if you want to make sure that directories do _not_ have world write permission.

Classifying Files (find -perm)

If you want to classify a set of files into several groups based on different criteria, you can use the comma operator to perform multiple independent tests on the files. Here is an example:

find / -type d $ -perm -o=w -fprint allwrite , \
-perm -o=x -fprint allexec $

echo "Directories that can be written to by everyone:"
cat allwrite
echo ""
echo "Directories with search permissions for everyone:"
cat allexec

`find' has only to make one scan through the directory tree (which is one of the most time consuming parts of its work).

The tools in the findutils package, and in particular `find', have a large number of options. This means that quite often, there is more than one way to do things. Some of the options and facilities only exist for compatibility with other tools, and findutils provides improved ways of doing things.

Deleting Files (find -mtime -exec)

One of the most common tasks that `find' is used for is locating files that can be deleted. This might include:

* Files last modified more than 3 years ago which haven't been accessed for at least 2 years

* Files belonging to a certain user

* Temporary files which are no longer required

This example concentrates on the actual deletion task rather than on sophisticated ways of locating the files that need to be deleted. We'll assume that the files we want to delete are old files underneath `/var/tmp/stuff'.

The traditional way to delete files in `/var/tmp/stuff' that have not been modified in over 90 days would have been:

find /var/tmp/stuff -mtime +90 -exec /bin/rm {} \;

The above command uses `-exec' to run the `/bin/rm' command to remove each file. This approach works and in fact would have worked in Version 7 Unix in 1979. However, there are a number of problems with this approach.

The most obvious problem with the approach above is that it causes `find' to fork every time it finds a file that needs to delete, and the child process then has to use the `exec' system call to launch `/bin/rm'. All this is quite inefficient. If we are going to use `/bin/rm' to do this job, it is better to make it delete more than one file at a time.

The most obvious way of doing this is to use the shell's command expansion feature:

/bin/rm `find /var/tmp/stuff -mtime +90 -print`

or you could use the more modern form

/bin/rm `find /var/tmp/stuff -mtime +90 -print`

or you could use the more modern form

/bin/rm $(find /var/tmp/stuff -mtime +90 -print)

The commands above are much more efficient than the first attempt. However, there is a problem with them. The shell has a maximum command length which is imposed by the operating system (the actual limit varies between systems). This means that while the command expansion technique will usually work, it will suddenly fail when there are lots of files to delete. Since the task is to delete unwanted files, this is precisely the time we don't want things to go wrong.

Making Use of `xargs' (find xargs)

So, is there a way to be more efficient in the use of `fork()' and `exec()' without running up against this limit? Yes, we can be almost optimally efficient by making use of the `xargs' command. The `xargs' command reads arguments from its standard input and builds them into command lines. We can use it like this:

find /var/tmp/stuff -mtime +90 -print | xargs /bin/rm

For example if the files found by `find' are `/var/tmp/stuff/A', `/var/tmp/stuff/B' and `/var/tmp/stuff/C' then `xargs' might issue the commands

/bin/rm /var/tmp/stuff/A /var/tmp/stuff/B
/bin/rm /var/tmp/stuff/C

The above assumes that `xargs' has a very small maximum command line length. The real limit is much larger but the idea is that `xargs' will run `/bin/rm' as many times as necessary to get the job done, given the limits on command line length.

This usage of `xargs' is pretty efficient, and the `xargs' command is widely implemented (all modern versions of Unix offer it). So far then, the news is all good. However, there is bad news too.

Unusual characters in filenames (find -mtime -print xargs)

Unix-like systems allow any characters to appear in file names with the exception of the ASCII NUL character and the slash. Slashes can occur in path names (as the directory separator) but not in the names of exception of the ASCII NUL character and the slash. Slashes can occur in path names (as the directory separator) but not in the names of actual directory entries. This means that the list of files that `xargs' reads could in fact contain white space characters - spaces, tabs and newline characters. Since by default, `xargs' assumes that the list of files it is reading uses white space as an argument separator, it cannot correctly handle the case where a filename actually includes white space. This makes the default behaviour of `xargs' almost useless for handling arbitrary data.

To solve this problem, GNU findutils introduced the `-print0' action for `find'. This uses the ASCII NUL character to separate the entries in the file list that it produces. This is the ideal choice of separator since it is the only character that cannot appear within a path name. The `-0' option to `xargs' makes it assume that arguments are separated with ASCII NUL instead of white space. It also turns off another misfeature in the default behaviour of `xargs', which is that it pays attention to quote characters in its input. Some versions of `xargs' also terminate when they see a lone `_' in the input, but GNU `find' no longer does that (since it has become an optional behaviour in the Unix standard).

So, putting `find -print0' together with `xargs -0' we get this command:

find /var/tmp/stuff -mtime +90 -print0 | xargs -0 /bin/rm

The result is an efficient way of proceeding that correctly handles all the possible characters that could appear in the list of files to delete. This is good news. However, there is, as I'm sure you're expecting, also more bad news. The problem is that this is not a portable construct; although other versions of Unix (notably BSD-derived ones) support `-print0', it's not universal. So, is there a more universal mechanism?

Using `-exec' (find -exec)

There is indeed a more universal mechanism, which is a slight modification to the `-exec' action. The normal `-exec' action assumes that the command to run is terminated with a semicolon (the semicolon normally has to be quoted in order to protect it from interpretation as the shell command separator). The SVR4 edition of Unix introduced a slight variation, which involves terminating the command with `+' instead: slight variation, which involves terminating the command with `+' instead:

find /var/tmp/stuff -mtime +90 -exec /bin/rm {} \+

The above use of `-exec' causes `find' to build up a long command line and then issue it. This can be less efficient than some uses of `xargs'; for example `xargs' allows new command lines to be built up while the previous command is still executing, and allows you to specify a number of commands to run in parallel. However, the `find ... -exec ... +' construct has the advantage of wide portability. GNU findutils did not support `-exec ... +' until version 4.2.12; one of the reasons for this is that it already had the `-print0' action in any case.

A more secure version of `-exec' (find -exec)

The command above seems to be efficient and portable. However, within it lurks a security problem. The problem is shared with all the commands we've tried in this worked example so far, too. The security problem is a race condition; that is, if it is possible for somebody to manipulate the filesystem that you are searching while you are searching it, it is possible for them to persuade your `find' command to cause the deletion of a file that you can delete but they normally cannot.

The problem occurs because the `-exec' action is defined by the POSIX standard to invoke its command with the same working directory as `find' had when it was started. This means that the arguments which replace the {} include a relative path from `find''s starting point down the file that needs to be deleted. For example,

find /var/tmp/stuff -mtime +90 -exec /bin/rm {} \+

might actually issue the command:

/bin/rm /var/tmp/stuff/A /var/tmp/stuff/B /var/tmp/stuff/passwd

Notice the file `/var/tmp/stuff/passwd'. Likewise, the command:

cd /var/tmp && find stuff -mtime +90 -exec /bin/rm {} \+

might actually issue the command:

/bin/rm stuff/A stuff/B stuff/passwd

If an attacker can rename `stuff' to something else (making use of their write permissions in `/var/tmp') they can replace it with a symbolic link to `/etc'. That means that the `/bin/rm' command will be invoked on `/etc/passwd'. If you are running your `find' command as root, the attacker has just managed to delete a vital file. All they needed to do to achieve this was replace a subdirectory with a symbolic link at the vital moment.

There is however, a simple solution to the problem. This is an action which works a lot like `-exec' but doesn't need to traverse a chain of directories to reach the file that it needs to work on. This is the `-execdir' action, which was introduced by the BSD family of operating systems. The command,

find /var/tmp/stuff -mtime +90 -execdir /bin/rm {} \+

might delete a set of files by performing these actions:

1. Change directory to /var/tmp/stuff/foo

2. Invoke `/bin/rm ./file1 ./file2 ./file3'

3. Change directory to /var/tmp/stuff/bar

4. Invoke `/bin/rm ./file99 ./file100 ./file101'

This is a much more secure method. We are no longer exposed to a race condition. For many typical uses of `find', this is the best strategy. It's reasonably efficient, but the length of the command line is limited not just by the operating system limits, but also by how many files we actually need to delete from each directory.

Is it possible to do any better? In the case of general file processing, no. However, in the specific case of deleting files it is indeed possible to do better.

Using the `-delete' action

The most efficient and secure method of solving this problem is to use the `-delete' action: he most efficient and secure method of solving this problem is to use the `-delete' action:

find /var/tmp/stuff -mtime +90 -delete

This alternative is more efficient than any of the `-exec' or `-execdir' actions, since it entirely avoids the overhead of forking a new process and using `exec' to run `/bin/rm'. It is also normally more efficient than `xargs' for the same reason. The file deletion is performed from the directory containing the entry to be deleted, so the `-delete' action has the same security advantages as the `-execdir' action has.

The `-delete' action was introduced by the BSD family of operating systems.

Improving things still further

Is it possible to improve things still further? Not without either modifying the system library to the operating system or having more specific knowledge of the layout of the filesystem and disk I/O subsystem, or both.

The `find' command traverses the filesystem, reading directories. It then issues a separate system call for each file to be deleted. If we could modify the operating system, there are potential gains that could be made:

* We could have a system call to which we pass more than one filename for deletion

* Alternatively, we could pass in a list of inode numbers (on GNU/Linux systems, `readdir()' also returns the inode number of each directory entry) to be deleted.

The above possibilities sound interesting, but from the kernel's point of view it is difficult to enforce standard Unix access controls for such processing by inode number. Such a facility would probably need to be restricted to the superuser.

Another way of improving performance would be to increase the parallelism of the process. For example if the directory hierarchy we are searching is actually spread across a number of disks, we might somehow be able to arrange for `find' to process each disk in parallel. re searching is actually spread across a number of disks, we might somehow be able to arrange for `find' to process each disk in parallel. In practice GNU `find' doesn't have such an intimate understanding of the system's filesystem layout and disk I/O subsystem.

However, since the system administrator can have such an understanding they can take advantage of it like so:

find /var/tmp/stuff1 -mtime +90 -delete &
find /var/tmp/stuff2 -mtime +90 -delete &
find /var/tmp/stuff3 -mtime +90 -delete &
find /var/tmp/stuff4 -mtime +90 -delete &
wait

In the example above, four separate instances of `find' are used to search four subdirectories in parallel. The `wait' command simply waits for all of these to complete. Whether this approach is more or less efficient than a single instance of `find' depends on a number of things:

* Are the directories being searched in parallel actually on separate disks? If not, this parallel search might just result in a lot of disk head movement and so the speed might even be slower.

* Other activity - are other programs also doing things on those disks?

Conclusion

The fastest and most secure way to delete files with the help of `find' is to use `-delete'. Using `xargs -0 -P N' can also make effective use of the disk, but it is not as secure.

In the case where we're doing things other than deleting files, the most secure alternative is `-execdir ... +', but this is not as portable as the insecure action `-exec ... +'.

The `-delete' action is not completely portable, but the only other possibility which is as secure (`-execdir') is no more portable. The most efficient portable alternative is `-exec ...+', but this is insecure and isn't supported by versions of GNU findutils prior to 4.2.12.

Copying A Subset of Files (find -name -prune -o)

Suppose you want to copy some files from `/source-dir' to `/dest-dir', but there are a small number of files in `/source-dir' you don't want to copy.

One option of course is `cp /source-dir /dest-dir' followed by deletion of the unwanted material under `/dest-dir'. But often that can be inconvenient, because for example we would have copied a large amount of extraneous material, or because `/dest-dir' is too small. Naturally there are many other possible reasons why this strategy may be unsuitable.

So we need to have some way of identifying which files we want to copy, and we need to have a way of copying that file list. The second part of this condition is met by `cpio -p'. Of course, we can identify the files we wish to copy by using `find'. Here is a command that solves our problem:

cd /source-dir
find . -name '.snapshot' -prune -o $ \! -name '*~' -print0 $ |
cpio -pmd0 /dest-dir

The first part of the `find' command here identifies files or directories named `.snapshot' and tells `find' not to recurse into them (since they do not need to be copied). The combination `-name '.snapshot' -prune' yields false for anything that didn't get pruned, but it is exactly those files we want to copy. Therefore we need to use an OR (`-o') condition to introduce the rest of our expression. The remainder of the expression simply arranges for the name of any file not ending in `~' to be printed.

Using `-print0' ensures that white space characters in file names do not pose a problem. The `cpio' command does the actual work of copying files. The program as a whole fails if the `cpio' program returns nonzero. If the `find' command returns non-zero on the other hand, the Unix shell will not diagnose a problem (since `find' is not the last command in the pipeline).

Updating A Timestamp File (find -newer -exec)

Suppose we have a directory full of files which is maintained with a set of automated tools; perhaps one set of tools updates them and another set of tools uses the result. In this situation, it might be useful for the second set of tools to know if the files have recently been changed. It might be useful, for example, to have a 'timestamp' file which gives the timestamp on the newest file in the collection.

We can use `find' to achieve this, but there are several different ways to do it.

Updating the Timestamp The Wrong Way

The obvious but wrong answer is just to use `-newer':

find subdir -newer timestamp -exec touch -r {} timestamp \;

This does the right sort of thing but has a bug. Suppose that two files in the subdirectory have been updated, and that these are called `file1' and `file2'. The command above will update `timestamp' with the modification time of `file1' or that of `file2', but we don't know which one. Since the timestamps on `file1' and `file2' will in general be different, this could well be the wrong value.

One solution to this problem is to modify `find' to recheck the modification time of `timestamp' every time a file is to be compared against it, but that will reduce the performance of `find'.

Using the test utility to compare timestamps

The `test' command can be used to compare timestamps:

find subdir -exec test {} -nt timestamp \; -exec touch -r {} timestamp \;

This will ensure that any changes made to the modification time of `timestamp' that take place during the execution of `find' are taken into account. This resolves our earlier problem, but unfortunately this runs much more slowly.

A combined approach

We can of course still use `-newer' to cut down on the number of calls to `test':

find subdir -newer timestamp -and \
-exec test {} -nt timestamp \; -and \
-exec touch -r {} timestamp \;

Here, the `-newer' test excludes all the files which are definitely older than the timestamp, but all the files which are newer than the old value of the timestamp are compared against the current updated timestamp.

This is indeed faster in general, but the speed difference will depend on how many updated files there are.

Using `-printf' and `sort' to compare timestamps

It is possible to use the `-printf' action to abandon the use of `test' entirely:

newest=$(find subdir -newer timestamp -printf "%A%p\n" |
     sort -n |
     tail -1 |
     cut -d: -f2- )
touch -r "${newest:-timestamp}" timestamp

The command above works by generating a list of the timestamps and names of all the files which are newer than the timestamp. The `sort', `tail' and `cut' commands simply pull out the name of the file with the largest timestamp value (that is, the latest file). The `touch' command is then used to update the timestamp,

The `"${newest:-timestamp}"' expression simply expands to the value of `$newest' if that variable is set, but to `timestamp' otherwise. This ensures that an argument is always given to the `-r' option of the `touch' command.

This approach seems quite efficient, but unfortunately it has a roblem. Many operating systems now keep file modification time information at a granularity which is finer than one second. Findutils version 4.3.3 and later will print a fractional part with %A@, but older versions will not.

Solving the problem with `make'

Another tool which often works with timestamps is `make'. We can use `find' to generate a `Makefile' file on the fly and then use `make' to update the timestamps:

makefile=$(mktemp)
find subdir \
     $ \! -xtype l $ \
     -newer timestamp \
     -printf "timestamp:: %p\n\ttouch -r %p timestamp\n\n" > "$makefile"
make -f "$makefile"
rm -f "$makefile"

Unfortunately although the solution above is quite elegant, it fails to cope with white space within file names, and adjusting it to do so would require a rather complex shell script.

Coping files with odd filenames

We can fix both of these problems (looping and problems with white space), and do things more efficiently too. The following command works with newlines and doesn't need to sort the list of filenames.

find subdir -newer timestamp -printf "%A@:%p\0" |

perl -0 newest.pl |
xargs --no-run-if-empty --null -i \
find {} -maxdepth 0 -newer timestamp -exec touch -r {} timestamp \;

The first `find' command generates a list of files which are newer than the original timestamp file, and prints a list of them with their timestamps. The `newest.pl' script simply filters out all the filenames which have timestamps which are older than whatever the newest file is:

     #! /usr/bin/perl -0
     my @newest = ();
     my $latest_stamp = undef;
     while (<>) {
       my ($stamp, $name) = split(/:/);
       if (!defined($latest_stamp) || ($tstamp > $latest_stamp)) {
     $latest_stamp = $stamp;
     @newest = ();
     }
     if ($tstamp >= $latest_stamp) {
       push @newest, $name;
       }
     }
     print join("\0", @newest);

This prints a list of zero or more files, all of which are newer than the original timestamp file, and which have the same timestamp as each other, to the nearest second. The second `find' command takes each resulting file one at a time, and if that is newer than the timestamp file, the timestamp is updated.