Percentage value with GNU Diff

前端 未结 3 1124
孤城傲影
孤城傲影 2021-02-03 11:42

What is a good method for using diff to show a percentage difference between two files?

Such as if a file has 100 lines and a copy has 15 lines that have been changed th

相关标签:
3条回答
  • 2021-02-03 12:19

    Something like this perhaps?

    Two files, A1 and A2.

    $ sdiff -B -b -s A1 A2 | wc would give you how many lines differed. wc gives total, just divide.

    The -b and -B are to ignore blanks and blank lines, and -s says to suppress the common lines.

    0 讨论(0)
  • 2021-02-03 12:28

    https://superuser.com/questions/347560/is-there-a-tool-to-measure-file-difference-percentage has a neat solution for this,

    wdiff -s file1.txt file2.txt
    

    more options see man wdiff.

    0 讨论(0)
  • 2021-02-03 12:30

    Here's a script that will compare all .txt files and display the ones that have more than 15% duplication:

    #!/bin/bash
    
    # walk through all files in the current dir (and subdirs)
    # and compare them with other files, showing percentage
    # of duplication.
    
    # which type files to compare?
    # (wouldn't make sense to compare binary formats)
    ext="txt"
    
    # support filenames with spaces:
    IFS=$(echo -en "\n\b")
    
    working_dir="$PWD"
    working_dir_name=$(echo $working_dir | sed 's|.*/||')
    all_files="$working_dir/../$working_dir_name-filelist.txt"
    remaining_files="$working_dir/../$working_dir_name-remaining.txt"
    
    # get information about files:
    find -type f -print0 | xargs -0 stat -c "%s %n" | grep -v "/\." | \
         grep "\.$ext" | sort -nr > $all_files
    
    cp $all_files $remaining_files
    
    while read string; do
        fileA=$(echo $string | sed 's/.[^.]*\./\./')
        tail -n +2 "$remaining_files" > $remaining_files.temp
        mv $remaining_files.temp $remaining_files
        # remove empty lines since they produce false positives
        sed '/^$/d' $fileA > tempA
    
        echo Comparing $fileA with other files...
    
        while read string; do
            fileB=$(echo $string | sed 's/.[^.]*\./\./')
            sed '/^$/d' $fileB > tempB
            A_len=$(cat tempA | wc -l)
            B_len=$(cat tempB | wc -l)
    
            differences=$(sdiff -B -s tempA tempB | wc -l)
            common=$(expr $A_len - $differences)
    
            percentage=$(echo "100 * $common / $B_len" | bc)
            if [[ $percentage -gt 15 ]]; then
                echo "  $percentage% duplication in" \
                     "$(echo $fileB | sed 's|\./||')"
            fi
        done < "$remaining_files"
        echo " "
    done < "$all_files"
    
    rm tempA
    rm tempB
    rm $all_files
    rm $remaining_files
    
    0 讨论(0)
提交回复
热议问题