How to delete duplicate lines in a file without sorting it in Unix?

后端 未结 9 1519
Happy的楠姐
Happy的楠姐 2020-11-22 17:26

Is there a way to delete duplicate lines in a file in Unix?

I can do it with sort -u and uniq commands, but I want to use sed

相关标签:
9条回答
  • 2020-11-22 17:45

    The one-liner that Andre Miller posted above works except for recent versions of sed when the input file ends with a blank line and no chars. On my Mac my CPU just spins.

    Infinite loop if last line is blank and has no chars:

    sed '$!N; /^\(.*\)\n\1$/!P; D'

    Doesn't hang, but you lose the last line

    sed '$d;N; /^\(.*\)\n\1$/!P; D'

    The explanation is at the very end of the sed FAQ:

    The GNU sed maintainer felt that despite the portability problems
    this would cause, changing the N command to print (rather than
    delete) the pattern space was more consistent with one's intuitions
    about how a command to "append the Next line" ought to behave.
    Another fact favoring the change was that "{N;command;}" will
    delete the last line if the file has an odd number of lines, but
    print the last line if the file has an even number of lines.

    To convert scripts which used the former behavior of N (deleting
    the pattern space upon reaching the EOF) to scripts compatible with
    all versions of sed, change a lone "N;" to "$d;N;".

    0 讨论(0)
  • 2020-11-22 17:48

    From http://sed.sourceforge.net/sed1line.txt: (Please don't ask me how this works ;-) )

     # delete duplicate, consecutive lines from a file (emulates "uniq").
     # First line in a set of duplicate lines is kept, rest are deleted.
     sed '$!N; /^\(.*\)\n\1$/!P; D'
    
     # delete duplicate, nonconsecutive lines from a file. Beware not to
     # overflow the buffer size of the hold space, or else use GNU sed.
     sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'
    
    0 讨论(0)
  • 2020-11-22 17:52
    awk '!seen[$0]++' file.txt
    

    seen is an associative-array that Awk will pass every line of the file to. If a line isn't in the array then seen[$0] will evaluate to false. The ! is the logical NOT operator and will invert the false to true. Awk will print the lines where the expression evaluates to true. The ++ increments seen so that seen[$0] == 1 after the first time a line is found and then seen[$0] == 2, and so on.
    Awk evaluates everything but 0 and "" (empty string) to true. If a duplicate line is placed in seen then !seen[$0] will evaluate to false and the line will not be written to the output.

    0 讨论(0)
  • 2020-11-22 17:55

    Perl one-liner similar to @jonas's awk solution:

    perl -ne 'print if ! $x{$_}++' file
    

    This variation removes trailing whitespace before comparing:

    perl -lne 's/\s*$//; print if ! $x{$_}++' file
    

    This variation edits the file in-place:

    perl -i -ne 'print if ! $x{$_}++' file
    

    This variation edits the file in-place, and makes a backup file.bak

    perl -i.bak -ne 'print if ! $x{$_}++' file
    
    0 讨论(0)
  • 2020-11-22 17:55

    An alternative way using Vim(Vi compatible):

    Delete duplicate, consecutive lines from a file:

    vim -esu NONE +'g/\v^(.*)\n\1$/d' +wq

    Delete duplicate, nonconsecutive and nonempty lines from a file:

    vim -esu NONE +'g/\v^(.+)$\_.{-}^\1$/d' +wq

    0 讨论(0)
  • 2020-11-22 17:55

    This can be achieved using awk
    Below Line will display unique Values

    awk file_name | uniq
    

    You can output these unique values to a new file

    awk file_name | uniq > uniq_file_name
    

    new file uniq_file_name will contain only Unique values, no duplicates

    0 讨论(0)
提交回复
热议问题