md5 all files in a directory tree

前端 未结 6 1661
庸人自扰
庸人自扰 2021-02-07 04:43

I have a a directory with a structure like so:

.
├── Test.txt
├── Test1
│   ├── Test1.txt
│   ├── Test1_copy.txt
│   └── Test1a
│       ├── Test1a.txt
│       └─         


        
相关标签:
6条回答
  • 2021-02-07 05:17

    Using md5deep

    md5deep -r path/to/dir > sums.md5
    

    Using find and md5sum

    find relative/path/to/dir -type f -exec md5sum {} + > sums.md5
    

    Be aware, that when you run check on your MD5 sums with md5sum -c sums.md5, you need to run it from the same directory from which you generated sums.md5 file. This is because find outputs paths that are relative to your current location, which are then put into sums.md5 file.

    If this is a problem you can make relative/path/to/dir absolute (e.g. by puting $PWD/ in front of your path). This way you can run check on sums.md5 from any location. Disadvantage is, that now sums.md5 contains absolute paths, which makes it bigger.

    Fully featured function using find and md5sum

    You can put this function to your .bashrc file (located in your $HOME directory):

    function md5sums {
      if [ "$#" -lt 1 ]; then
        echo -e "At least one parameter is expected\n" \
                "Usage: md5sums [OPTIONS] dir"
      else
        local OUTPUT="checksums.md5"
        local CHECK=false
        local MD5SUM_OPTIONS=""
    
        while [[ $# > 1 ]]; do
          local key="$1"
          case $key in
            -c|--check)
              CHECK=true
              ;;
            -o|--output)
              OUTPUT=$2
              shift
              ;;
            *)
              MD5SUM_OPTIONS="$MD5SUM_OPTIONS $1"
              ;;
          esac
          shift
        done
        local DIR=$1 
    
        if [ -d "$DIR" ]; then  # if $DIR directory exists
          cd $DIR  # change to $DIR directory
          if [ "$CHECK" = true ]; then  # if -c or --check option specified
            md5sum --check $MD5SUM_OPTIONS $OUTPUT  # check MD5 sums in $OUTPUT file
          else                          # else
            find . -type f ! -name "$OUTPUT" -exec md5sum $MD5SUM_OPTIONS {} + > $OUTPUT  # Calculate MD5 sums for files in current directory and subdirectories excluding $OUTPUT file and save result in $OUTPUT file
          fi
          cd - > /dev/null  # change to previous directory
        else
          cd $DIR  # if $DIR doesn't exists, change to it to generate localized error message
        fi
      fi
    }
    

    After you run source ~/.bashrc, you can use md5sums like normal command:

    md5sums path/to/dir
    

    will generate checksums.md5 file in path/to/dir directory, containing MD5 sums of all files in this directory and subdirectories. Use:

    md5sums -c path/to/dir
    

    to check sums from path/to/dir/checksums.md5 file.

    Note that path/to/dir can be relative or absolute, md5sums will work fine either way. Resulting checksums.md5 file always contains paths relative to path/to/dir. You can use different file name then default checksums.md5 by supplying -o or --output option. All options, other then -c, --check, -o and --output are passed to md5sum.

    First half of md5sums function definition is responsible for parsing options. See this answer for more information about it. Second half contains explanatory comments.

    0 讨论(0)
  • 2021-02-07 05:21

    Updated Answer

    If you like the answer below, or any of the others, you can make a function that does the command for you. So, to test it, type the following into Terminal to declare a function:

    function sumthem(){ find "$1" -type f -print0 | parallel -0 -X md5 > checksums.md5; }
    

    Then you can just use:

    sumthem /Users/somebody/somewhere
    

    If that works how you like, you can add that line to the end of your "bash profile" and the function will be declared and available whenever you are logged in. Your "bash profile" is probably in $HOME/.profile

    Original Answer

    Why not get all your CPU cores working in parallel for you?

    find . -type f -print0 | parallel -0 -X md5sum
    

    This finds all the files (-type f) in the current directory (.) and prints them with a null byte at the end. These are then passed passed into GNU Parallel, which is told that the filenames end with a null byte (-0) and that it should do as many files as possible at a time (-X) to save creating a new process for each file and it should md5sum the files.

    This approach will pay the largest bonus, in terms off speed, with big images like Photoshop files.

    0 讨论(0)
  • 2021-02-07 05:24

    Use find command to list all files in directory tree, then use xargs to provide input to md5sum command

    find dirname -type f | xargs md5sum > checksums.md5
    
    0 讨论(0)
  • 2021-02-07 05:26

    How about:

    find /path/you/need -type f -exec md5sum {} \; > checksums.md5

    Update#1: Improved the command based on @twalberg's recommendation to handle white spaces in file names.

    Update#2: Improved based on @jil's suggestion, to remove unnecessary xargs call and use -exec option of find instead.

    Update#3: @Blake a naive implementation of your script would look something like this:

    #!/bin/bash
    # Usage: checksumchecker.sh <path>
    find "$1" -type f -exec md5sum {} \; > "$1"__checksums.md5
    
    0 讨论(0)
  • 2021-02-07 05:40
    #!/bin/bash
    shopt -s globstar
    md5sum "$1"/** > "${1}__checksums.md5"
    

    Explanation: shopt -s globstar(manual) enables ** recursive glob wildcard. It will mean that "$1"/** will expand to list of all the files recursively under the directory given as parameter $1. Then the script simply calls md5sum with this file list as parameter and > "${1}__checksums.md5" redirects the output to the file.

    0 讨论(0)
  • 2021-02-07 05:40
    md5deep -r $your_directory | awk {'print $1'} | sort | md5sum | awk {'print $1'}
    
    0 讨论(0)
提交回复
热议问题