How to delete duplicated rows based in a column value?

后端 未结 4 2007
囚心锁ツ
囚心锁ツ 2021-01-05 06:03

Given the following table

 123456.451 entered-auto_attendant
 123456.451 duration:76 real:76
 139651.526 entered-auto_attendant
 139651.526 duration:62 real:         


        
相关标签:
4条回答
  • 2021-01-05 06:44

    you didn't give an expected output, does this work for you?

     awk '!a[$1]++' file
    

    with your data, the output is:

    123456.451 entered-auto_attendant
    139651.526 entered-auto_attendant
    139382.537 entered-auto_attendant
    

    and this line prints only unique column1 line:

     awk '{a[$1]++;b[$1]=$0}END{for(x in a)if(a[x]==1)print b[x]}' file
    

    output:

    139382.537 entered-auto_attendant
    
    0 讨论(0)
  • 2021-01-05 06:48

    uniq, by default, compares the entire line. Since your lines are not identical, they are not removed.

    You can use sort to conveniently sort by the first field and also delete duplicates of it:

    sort -t ' ' -k 1,1 -u file
    
    • -t ' ' fields are separated by spaces
    • -k 1,1: only look at the first field
    • -u: delete duplicates

    Additionally, you might have seen the awk '!a[$0]++' trick for deduplicating lines. You can make this dedupe on the first column only using awk '!a[$1]++'.

    0 讨论(0)
  • 2021-01-05 06:57

    try this command

    awk '!x[$1]++ { print $1, $2 }' file
    
    0 讨论(0)
  • 2021-01-05 07:07

    Using awk:

    awk '!($1 in a){a[$1]++; next} $1 in a' file
    123456.451 duration:76 real:76
    139651.526 duration:62 real:62
    
    0 讨论(0)
提交回复
热议问题