How to compare and sort 2 csv's to show difference

后端 未结 4 2012
我在风中等你
我在风中等你 2021-01-24 18:28

Hi I have 2 csv\'s in the following format, (basically a list of email and the number of times we have been emailed by that sender):

file1.csv

Email,Val         


        
相关标签:
4条回答
  • 2021-01-24 19:02

    Pretty straight-forward with Awk!

    awk 'BEGIN{FS=OFS=","; printf "Name,Value1,Value2\n"}NR >1 && FNR==NR{map[$1]=$2; next}$1 in map{$(NF+1)=map[$1]; print}' file2 file1
    

    produces

    Name,Value1,Value2
    email1@email.com,2,3
    email2@email.com,4,6
    email3@email.com,1,8
    email4@email.com,6,2
    

    Set input and output field-separator to , in the BEGIN clause that gets executed before the input lines are processed and also the final header information needed. The part FNR==NR is run for the first file in order file2 in this case, create a hash-map, with an index set to the $1 and value set to $2 then on file1 for those lines whose hashed index value belongs in $1 create a new field $(NF+1) meaning the last field + 1 to the new value and print the result formed.

    0 讨论(0)
  • 2021-01-24 19:11

    if you want to keep the order

    awk to the rescue!

    $ awk  'BEGIN   {FS=OFS=","}
            NR==FNR {a[$1]=$2; next} 
            FNR==1  {print $1,$2"1",a[$1]"2"; next} 
                    {print $1,$2,a[$1]}' file2 file1
    
    Email,Value1,Value2
    email1@email.com,2,3
    email2@email.com,4,6
    email3@email.com,1,8
    email4@email.com,6,2
    

    note the order of files...

    0 讨论(0)
  • 2021-01-24 19:15
    1. build a loop running through each line from the first file.

    2. in that loop, build another loop comparing each line of the second file to the current line of the first file.

    3. write matches to your new file.

    0 讨论(0)
  • 2021-01-24 19:16

    using join program

    join -t, -o0,1.2,2.2 -a1 -a2 <(sort <file1.csv) <(sort <file2.csv)
    

    otherwise if files are already sorted and contain the same entries with bash builtins

    while
        IFS=, read -u3 em1 val1
        IFS=, read -u4 em2 val2
        [[ -n $em1 ]] && [[ -n $em2 ]]
    do
        if [[ $em1 = $em2 ]]; then
            echo "$em1,$val1,$val2"
        else
            echo "ERROR: $em1 <> $em2"
        fi
    done 3<file1.csv 4<file2.csv
    
    0 讨论(0)
提交回复
热议问题