I often use the find
command on Linux and macOS. I just discovered the command parallel
, and I would like to combine it with find
command
Parallel processing makes sense when your work is CPU bound (the CPU does the work, and the peripherals are mostly idle) but here, you are trying to improve the performance of a task which is I/O bound (the CPU is mostly idle, waiting for a busy peripheral). In this situation, adding parallelism will only add congestion, as multiple tasks will be fighting over the already-starved I/O bandwidth between them.
On macOS, the system already indexes all your data anyway (including the contents of word-processing documents, PDFs, email messages, etc); there's a friendly magnifying glass on the menu bar at the upper right where you can access a much faster and more versatile search, called Spotlight. (Though I agree that some of the more sophisticated controls of find
are missing; and the "user friendly" design gets in the way for me when it guesses what I want, and guesses wrong.)
Some Linux distros offer a similar facility; I would expect that to be the norm for anything with a GUI these days, though the details will differ between systems.
A more traditional solution on any Unix-like system is the locate command, which performs a similar but more limited task; it will create a (very snappy) index on file names, so you can say
locate fnord
to very quickly obtain every file whose name matches fnord
. The index is simply a copy of the results of a find
run from last night (or however you schedule the backend to run). The command is already installed on macOS, though you have to enable the back end if you want to use it. (Just run locate locate
to get further instructions.)
You could build something similar yourself if you find yourself often looking for files with a particular set of permissions and a particular owner, for example (these are not features which locate
records); just run a nightly (or hourly etc) find
which collects these features into a database -- or even just a text file -- which you can then search nearly instantly.
For running jobs in parallel, you don't really need GNU parallel
, though it does offer a number of conveniences and enhancements for many use cases; you already have xargs -P
. (The xargs
on macOS which originates from BSD is more limited than GNU xargs
which is what you'll find on many Linuxes; but it does have the -P
option.)
For example, here's how to run eight parallel find
instances with xargs -P
:
printf '%s\n' */ | xargs -I {} -P 8 find {} -name '*.ogg'
(This assumes the wildcard doesn't match directories which contain single quotes or newlines or other shenanigans; GNU xargs
has the -0
option to fix a large number of corner cases like that; then you'd use '%s\0'
as the format string for printf
.)
As the parallel documentation readily explains, its general syntax is
parallel -options command ...
where {}
will be replaced with the current input line (if it is missing, it will be implicitly added at the end of command ...
) and the (obviously optional) :::
special token allows you to specify an input source on the command line instead of as standard input.
Anything outside of those special tokens is passed on verbatim, so you can add find
options at your heart's content just by specifying them literally.
parallel -j8 find {} -type f -name '*.ogg' ::: */
I don't speak zsh
but refactored for regular POSIX sh
your function could be something like
ff () {
parallel -j8 find {} -type f -iname "$2" ::: "$1"
}
though I would perhaps switch the arguments so you can specify a name pattern and a list of files to search, à la grep
.
ff () {
# "local" is not POSIX but works in many sh versions
local pat=$1
shift
parallel -j8 find {} -type f -iname "$pat" ::: "$@"
}
But again, spinning your disk to find things which are already indexed is probably something you should stop doing, rather than facilitate.