I\'m trying to run a find
command for all JavaScript files, but how do I exclude a specific directory?
Here is the find
code we\'re using.<
TLDR: understand your root directories and tailor your search from there, using the -path <excluded_path> -prune -o
option. Do not include a trailing /
at the end of the excluded path.
Example:
find / -path /mnt -prune -o -name "*libname-server-2.a*" -print
To effectively use the find
I believe that it is imperative to have a good understanding of your file system directory structure. On my home computer I have multi-TB hard drives, with about half of that content backed up using rsnapshot
(i.e., rsync
). Although backing up to to a physically independent (duplicate) drive, it is mounted under my system root (/
) directory: /mnt/Backups/rsnapshot_backups/
:
/mnt/Backups/
└── rsnapshot_backups/
├── hourly.0/
├── hourly.1/
├── ...
├── daily.0/
├── daily.1/
├── ...
├── weekly.0/
├── weekly.1/
├── ...
├── monthly.0/
├── monthly.1/
└── ...
The /mnt/Backups/rsnapshot_backups/
directory currently occupies ~2.9 TB, with ~60M files and folders; simply traversing those contents takes time:
## As sudo (#), to avoid numerous "Permission denied" warnings:
time find /mnt/Backups/rsnapshot_backups | wc -l
60314138 ## 60.3M files, folders
34:07.30 ## 34 min
time du /mnt/Backups/rsnapshot_backups -d 0
3112240160 /mnt/Backups/rsnapshot_backups ## 3.1 TB
33:51.88 ## 34 min
time rsnapshot du ## << more accurate re: rsnapshot footprint
2.9T /mnt/Backups/rsnapshot_backups/hourly.0/
4.1G /mnt/Backups/rsnapshot_backups/hourly.1/
...
4.7G /mnt/Backups/rsnapshot_backups/weekly.3/
2.9T total ## 2.9 TB, per sudo rsnapshot du (more accurate)
2:34:54 ## 2 hr 35 min
Thus, anytime I need to search for a file on my /
(root) partition, I need to deal with (avoid if possible) traversing my backups partition.
EXAMPLES
Among the approached variously suggested in this thread (How to exclude a directory in find . command), I find that searches using the accepted answer are much faster -- with caveats.
Solution 1
Let's say I want to find the system file libname-server-2.a
, but I do not want to search through my rsnapshot
backups. To quickly find a system file, use the exclude path /mnt
(i.e., use /mnt
, not /mnt/
, or /mnt/Backups
, or ...):
## As sudo (#), to avoid numerous "Permission denied" warnings:
time find / -path /mnt -prune -o -name "*libname-server-2.a*" -print
/usr/lib/libname-server-2.a
real 0m8.644s ## 8.6 sec <<< NOTE!
user 0m1.669s
sys 0m2.466s
## As regular user (victoria); I also use an alternate timing mechanism, as
## here I am using 2>/dev/null to suppress "Permission denied" warnings:
$ START="$(date +"%s")" && find 2>/dev/null / -path /mnt -prune -o \
-name "*libname-server-2.a*" -print; END="$(date +"%s")"; \
TIME="$((END - START))"; printf 'find command took %s sec\n' "$TIME"
/usr/lib/libname-server-2.a
find command took 3 sec ## ~3 sec <<< NOTE!
... finds that file in just a few seconds, while this take much longer (appearing to recurse through all of the "excluded" directories):
## As sudo (#), to avoid numerous "Permission denied" warnings:
time find / -path /mnt/ -prune -o -name "*libname-server-2.a*" -print
find: warning: -path /mnt/ will not match anything because it ends with /.
/usr/lib/libname-server-2.a
real 33m10.658s ## 33 min 11 sec (~231-663x slower!)
user 1m43.142s
sys 2m22.666s
## As regular user (victoria); I also use an alternate timing mechanism, as
## here I am using 2>/dev/null to suppress "Permission denied" warnings:
$ START="$(date +"%s")" && find 2>/dev/null / -path /mnt/ -prune -o \
-name "*libname-server-2.a*" -print; END="$(date +"%s")"; \
TIME="$((END - START))"; printf 'find command took %s sec\n' "$TIME"
/usr/lib/libname-server-2.a
find command took 1775 sec ## 29.6 min
Solution 2
The other solution offered in this thread (SO#4210042) also performs poorly:
## As sudo (#), to avoid numerous "Permission denied" warnings:
time find / -name "*libname-server-2.a*" -not -path "/mnt"
/usr/lib/libname-server-2.a
real 33m37.911s ## 33 min 38 sec (~235x slower)
user 1m45.134s
sys 2m31.846s
time find / -name "*libname-server-2.a*" -not -path "/mnt/*"
/usr/lib/libname-server-2.a
real 33m11.208s ## 33 min 11 sec
user 1m22.185s
sys 2m29.962s
SUMMARY | CONCLUSIONS
Use the approach illustrated in "Solution 1"
find / -path /mnt -prune -o -name "*libname-server-2.a*" -print
i.e.
... -path <excluded_path> -prune -o ...
noting that whenever you add the trailing /
to the excluded path, the find
command then recursively enters (all those) /mnt/*
directories -- which in my case, because of the /mnt/Backups/rsnapshot_backups/*
subdirectories, additionally includes ~2.9 TB of files to search! By not appending a trailing /
the search should complete almost immediately (within seconds).
"Solution 2" (... -not -path <exclude path> ...
) likewise appears to recursively search through the excluded directories -- not returning excluded matches, but unnecessarily consuming that search time.
Searching within those rsnapshot
backups:
To find a file in one of my hourly/daily/weekly/monthly rsnapshot
backups):
$ START="$(date +"%s")" && find 2>/dev/null /mnt/Backups/rsnapshot_backups/daily.0 -name '*04t8ugijrlkj.jpg'; END="$(date +"%s")"; TIME="$((END - START))"; printf 'find command took %s sec\n' "$TIME"
/mnt/Backups/rsnapshot_backups/daily.0/snapshot_root/mnt/Vancouver/temp/04t8ugijrlkj.jpg
find command took 312 sec ## 5.2 minutes: despite apparent rsnapshot size
## (~4 GB), it is in fact searching through ~2.9 TB)
Excluding a nested directory:
Here, I want to exclude a nested directory, e.g. /mnt/Vancouver/projects/ie/claws/data/*
when searching from /mnt/Vancouver/projects/
:
$ time find . -iname '*test_file*'
./ie/claws/data/test_file
./ie/claws/test_file
0:01.97
$ time find . -path '*/data' -prune -o -iname '*test_file*' -print
./ie/claws/test_file
0:00.07
Aside: Adding -print
at the end of the command suppresses the printout of the excluded directory:
$ find / -path /mnt -prune -o -name "*libname-server-2.a*"
/mnt
/usr/lib/libname-server-2.a
$ find / -path /mnt -prune -o -name "*libname-server-2.a*" -print
/usr/lib/libname-server-2.a
I was using find
to provide a list of files for xgettext
, and wanted to omit a specific directory and its contents. I tried many permutations of -path
combined with -prune
but was unable to fully exclude the directory which I wanted gone.
Although I was able to ignore the contents of the directory which I wanted ignored, find
then returned the directory itself as one of the results, which caused xgettext
to crash as a result (doesn't accept directories; only files).
My solution was to simply use grep -v
to skip the directory that I didn't want in the results:
find /project/directory -iname '*.php' -or -iname '*.phtml' | grep -iv '/some/directory' | xargs xgettext
Whether or not there is an argument for find
that will work 100%, I cannot say for certain. Using grep
was a quick and easy solution after some headache.
find . \( -path '.**/.git' -o -path '.**/.hg' \) -prune -o -name '*.js' -print
The example above finds all *.js
files under the current directory, excluding folders .git
and .hg
, does not matter how deep these .git
and .hg
folders are.
Note: this also works:
find . \( -path '.*/.git' -o -path '.*/.hg' \) -prune -o -name '*.js' -print
but I prefer the **
notation for consistency with some other tools which would be off topic here.
I find the following easier to reason about than other proposed solutions:
find build -not \( -path build/external -prune \) -name \*.js
# you can also exclude multiple paths
find build -not \( -path build/external -prune \) -not \( -path build/blog -prune \) -name \*.js
Important Note: the paths you type after -path
must exactly match what find
would print without the exclusion. If this sentence confuses you just make sure to use full paths through out the whole command like this: find /full/path/ -not \( -path /full/path/exclude/this -prune \) ...
. See note [1] if you'd like a better understanding.
Inside \(
and \)
is an expression that will match exactly build/external
(see important note above), and will, on success, avoid traversing anything below. This is then grouped as a single expression with the escaped parenthesis, and prefixed with -not
which will make find
skip anything that was matched by that expression.
One might ask if adding -not
will not make all other files hidden by -prune
reappear, and the answer is no. The way -prune
works is that anything that, once it is reached, the files below that directory are permanently ignored.
This comes from an actual use case, where I needed to call yui-compressor on some files generated by wintersmith, but leave out other files that need to be sent as-is.
Note [1]: If you want to exclude /tmp/foo/bar
and you run find like this "find /tmp \(...
" then you must specify -path /tmp/foo/bar
. If on the other hand you run find like this cd /tmp; find . \(...
then you must specify -path ./foo/bar
.
Use the -prune
switch. For example, if you want to exclude the misc
directory just add a -path ./misc -prune -o
to your find command:
find . -path ./misc -prune -false -o -name '*.txt'
Here is an example with multiple directories:
find . -type d \( -path dir1 -o -path dir2 -o -path dir3 \) -prune -false -o -name '*.txt'
Here we exclude ./dir1, ./dir2 and ./dir3 in the current directory, since in find
expressions it is an action that acts on the criteria -path dir1 -o -path dir2 -o -path dir3
(if dir1 or dir2 or dir3), ANDed with type -d
.
To exclude directory name at any level, use -name
:
find . -type d \( -name node_modules -o -name dir2 -o -path name \) -prune -false -o -name '*.json'
There are plenty of good answers, it just took me some time to understand what each element of the command was for and the logic behind it.
find . -path ./misc -prune -o -name '*.txt' -print
find will start finding files and directories in the current directory, hence the find .
.
The -o
option stands for a logical OR and separates the two parts of the command :
[ -path ./misc -prune ] OR [ -name '*.txt' -print ]
Any directory or file that is not the ./misc directory will not pass the first test -path ./misc
. But they will be tested against the second expression. If their name corresponds to the pattern *.txt
they get printed, because of the -print
option.
When find reaches the ./misc directory, this directory only satisfies the first expression. So the -prune
option will be applied to it. It tells the find command to not explore that directory. So any file or directory in ./misc will not even be explored by find, will not be tested against the second part of the expression and will not be printed.