How to find duplicate filenames (recursively) in a given directory? BASH

前端 未结 7 786
执念已碎
执念已碎 2021-02-04 08:06

I need to find every duplicate filenames in a given dir tree. I dont know, what dir tree user will give as a script argument, so I dont know the directory hierarchy. I tried thi

相关标签:
7条回答
  • 2021-02-04 08:28

    One "find" command only:

    lst=$( find . -type f )
    echo "$lst" | rev | cut -f 1 -d/ | rev | sort -f | uniq -i | while read f; do
       names=$( echo "$lst" | grep -i -- "/$f$" )
       n=$( echo "$names" | wc -l )
       [ $n -gt 1 ] && echo -e "Duplicates found ($n):\n$names"
    done
    
    0 讨论(0)
  • 2021-02-04 08:28

    Here is my contribution (this just searches for a specific file type, pdfs in this case) but it does so recursively:

    #!/usr/bin/env bash
    
    find . -type f | while read filename; do
        filename=$(basename -- "$filename")
        extension="${filename##*.}"
        if [[ $extension == "pdf" ]]; then
            fileNameCount=`find . -iname "$filename" | wc -l`
            if [[ $fileNameCount -gt 1 ]]; then
                echo "File Name: $filename, count: $fileNameCount"
            fi
        fi
    done
    
    0 讨论(0)
  • 2021-02-04 08:32
    #!/bin/sh
    dirname=/path/to/check
    find $dirname -type f | 
    while read vo
    do
      echo `basename "$vo"`
    done | awk '{arr[$0]++; next} END{for (i in arr){if(arr[i]>1){print i}}}  
    
    0 讨论(0)
  • 2021-02-04 08:33

    Yes this is a really old question. But all those loops and temporary files seem a bit cumbersome.

    Here's my 1-line answer:

    find /PATH/TO/FILES -type f -printf '%p/ %f\n' | sort -k2 | uniq -f1 --all-repeated=separate
    

    It has its limitations due to uniq and sort:

    • no whitespace (space, tab) in filename (will be interpreted as new field by uniq and sort)
    • needs file name printed as last field delimited by space (uniq doesn't support comparing only 1 field and is inflexible with field delimiters)

    But it is quite flexible regarding its output thanks to find -printf and works well for me. Also seems to be what @yak tried to achieve originally.

    Demonstrating some of the options you have with this:

    find  /PATH/TO/FILES -type f -printf 'size: %s bytes, modified at: %t, path: %h/, file name: %f\n' | sort -k15 | uniq -f14 --all-repeated=prepend
    

    Also there are options in sort and uniq to ignore case (as the topic opener intended to achieve by piping through tr). Look them up using man uniq or man sort.

    0 讨论(0)
  • 2021-02-04 08:33

    This solution writes one temporary file to a temporary directory for every unique filename found. In the temporary file, I write the path where I first found the unique filename, so that I can output it later. So, I create a lot more files that other posted solutions. But, it was something I could understand.

    Following is the script, named fndupe.

    #!/bin/bash
    
    # Create a temp directory to contain placeholder files.
    tmp_dir=`mktemp -d`
    
    # Get paths of files to test from standard input.
    while read p; do
      fname=$(basename "$p")
      tmp_path=$tmp_dir/$fname
      if [[ -e $tmp_path ]]; then
        q=`cat "$tmp_path"`
        echo "duplicate: $p"
        echo "    first: $q"
      else
        echo $p > "$tmp_path" 
      fi
    done
    
    exit
    

    Following is an example of using the script.

    $ find . -name '*.tif' | fndupe
    

    Following is example output when the script finds duplicate filenames.

    duplicate: a/b/extra/gobble.tif
        first: a/b/gobble.tif
    

    Tested with Bash version: GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)

    0 讨论(0)
  • 2021-02-04 08:36
    #!/bin/bash
    
    file=`mktemp /tmp/duplicates.XXXXX` || { echo "Error creating tmp file"; exit 1; }
    find $1 -type f |sort >  $file
    awk -F/ '{print tolower($NF)}' $file |
            uniq -c|
            awk '$1>1 { sub(/^[[:space:]]+[[:digit:]]+[[:space:]]+/,""); print }'| 
            while read line;
                    do grep -i "$line" $file;
            done
    
    rm $file
    

    And it also work with spaces in filenames. Here's a simple test (the first argument is the directory):

    ./duplicates.sh ./test
    ./test/2/INC 255286
    ./test/INC 255286
    
    0 讨论(0)
提交回复
热议问题