More efficient way to find & tar millions of files

前端 未结 9 555
一向
一向 2021-01-30 17:53

I\'ve got a job running on my server at the command line prompt for a two days now:

find data/ -name filepattern-*2009* -exec tar uf 2009.tar {} ;
相关标签:
9条回答
  • 2021-01-30 18:32

    If you already did the second command that created the file list, just use the -T option to tell tar to read the files names from that saved file list. Running 1 tar command vs N tar commands will be a lot better.

    0 讨论(0)
  • 2021-01-30 18:40

    Here's a find-tar combination that can do what you want without the use of xargs or exec (which should result in a noticeable speed-up):

    tar --version    # tar (GNU tar) 1.14 
    
    # FreeBSD find (on Mac OS X)
    find -x data -name "filepattern-*2009*" -print0 | tar --null --no-recursion -uf 2009.tar --files-from -
    
    # for GNU find use -xdev instead of -x
    gfind data -xdev -name "filepattern-*2009*" -print0 | tar --null --no-recursion -uf 2009.tar --files-from -
    
    # added: set permissions via tar
    find -x data -name "filepattern-*2009*" -print0 | \
        tar --null --no-recursion --owner=... --group=... --mode=... -uf 2009.tar --files-from -
    
    0 讨论(0)
  • 2021-01-30 18:45

    There is a utility for this called tarsplitter.

    tarsplitter -m archive -i folder/*.json -o archive.tar -p 8
    

    will use 8 threads to archive the files matching "folder/*.json" into an output archive of "archive.tar"

    https://github.com/AQUAOSOTech/tarsplitter

    0 讨论(0)
  • 2021-01-30 18:46

    Simplest (also remove file after archive creation):

    find *.1  -exec tar czf '{}.tgz' '{}' --remove-files \;
    
    0 讨论(0)
  • 2021-01-30 18:52

    The way you currently have things, you are invoking the tar command every single time it finds a file, which is not surprisingly slow. Instead of taking the two hours to print plus the amount of time it takes to open the tar archive, see if the files are out of date, and add them to the archive, you are actually multiplying those times together. You might have better success invoking the tar command once, after you have batched together all the names, possibly using xargs to achieve the invocation. By the way, I hope you are using 'filepattern-*2009*' and not filepattern-*2009* as the stars will be expanded by the shell without quotes.

    0 讨论(0)
  • 2021-01-30 18:54

    To correctly handle file names with weird (but legal) characters (such as newlines, ...) you should write your file list to filesOfInterest.txt using find's -print0:

    find -x data -name "filepattern-*2009*" -print0 > filesOfInterest.txt
    tar --null --no-recursion -uf 2009.tar --files-from filesOfInterest.txt 
    
    0 讨论(0)
提交回复
热议问题