I have large number of directories with just one file -- index.html -- in each directory. I would want to use grep to look for pattern in file and then copy the directory alon
To preserve the directory structure, use cpio in pass-through mode. cpio
is about as old as tar and used to have more advantages, but it has kind of slipped into obscurity. I'm new to it and mostly followed an ancient Linux Journal cpio guide to build this command:
mkdir dest_dir
cd source_dir
grep -Zlr "string" . |cpio -p0dmv ../dest_dir
This passes a null-terminated* list of files matching your criteria through a pipeline directly into cpio
, which is designed to take a list of files in this manner and then either archive or copy ("pass-through," -p
). We do the latter here, preserving the directory structure (-d
) as well as modification times (-m
). I've set this to verbose (-v
) so you can watch the progress. If you're connecting via ssh
, you might not want that since rendering each filename over the network can slow down the process.
* Regarding null termination: I used grep -Zl
with cpio -0
to work around the issue of file names containing newlines (don't do that!); grep -Zl
lists all matching files delimited by null characters (the only invalid character for a path) and cpio -0
expects null-terminated inputs (as does xargs -0
).
I originally recommended tar
to create a temporary archive and tar
again to extract it into the new location. This used xargs
to convert the file list into arguments since tar
doesn't have the ability to accept its list of files within another file (or standard input, as cpio
does), but xargs
splits commands that are too long into multiple calls and tar
can't extract the concatenated output**.
mkdir dest_dir
cd source_dir
grep -Zlr "string" . |xargs -0 tar -pc |tar -pxi --directory=../dest_dir
This makes your destination directory, enters the source directory, and runs grep with -Zl
(null-terminated file list*) and -r
(recursive). xargs -0
turns that list into arguments for tar
, which archives them. Another tar
instance then extracts them into the destination directory.
** xargs
defaults to --max-procs=1
and should run one process at a time, resulting in multiple tarballs that are concatenated together. The tar format is supposed to be able to handle this, though further reading suggested a simple solution is to add a -i
(ignore zeros) to the extracting tar
to solve that problem. I added it to the above code but have not tested it.