remove duplicate lines with similar prefix

前端 未结 4 1993
无人及你
无人及你 2021-01-15 17:04

I need to remove similar lines in a file which has duplicate prefix and keep the unique ones.

From this,

abc/def/ghi/
abc/def/ghi/jkl/one/
abc/def/gh         


        
4条回答
  •  花落未央
    2021-01-15 17:54

    The following awk does what is requested, it reads the file twice.

    • In the first pass it builds up all possible prefixes per line
    • The second pass, it checks if the line is a possible prefix, if not print.

    The code is:

    awk -F'/' '(NR==FNR){s="";for(i=1;i<=NF-2;i++){s=s$i"/";a[s]};next}
               {if (! ($0 in a) ) {print $0}}'  
    

    You can also do it with reading the file a single time, but then you store it into memory :

    awk -F'/' '{s="";for(i=1;i<=NF-2;i++){s=s$i"/";a[s]}; b[NR]=$0; next}
               END {for(i=1;i<=NR;i++){if (! (b[i] in a) ) {print $0}}}' 
    

    Similar to the solution of Allan, but using grep -c :

    while read line; do (( $(grep -c $line ) == 1 )) && echo $line;  done < 
    

    Take into account that this construct reads the file (N+1) times where N is the amount of lines.

提交回复
热议问题