How to delete duplicate lines in a file without sorting it in Unix?

后端 未结 9 1496
Happy的楠姐
Happy的楠姐 2020-11-22 17:26

Is there a way to delete duplicate lines in a file in Unix?

I can do it with sort -u and uniq commands, but I want to use sed

9条回答
  •  南笙
    南笙 (楼主)
    2020-11-22 17:55

    uniq would be fooled by trailing spaces and tabs. In order to emulate how a human makes comparison, I am trimming all trailing spaces and tabs before comparison.

    I think that the $!N; needs curly braces or else it continues, and that is the cause of infinite loop.

    I have bash 5.0 and sed 4.7 in Ubuntu 20.10. The second one-liner did not work, at the character set match.

    Three variations, first to eliminate adjacent repeat lines, second to eliminate repeat lines wherever they occur, third to eliminate all but the last instance of lines in file.

    pastebin

    # First line in a set of duplicate lines is kept, rest are deleted.
    # Emulate human eyes on trailing spaces and tabs by trimming those.
    # Use after norepeat() to dedupe blank lines.
    
    dedupe() {
     sed -E '
      $!{
       N;
       s/[ \t]+$//;
       /^(.*)\n\1$/!P;
       D;
      }
     ';
    }
    
    # Delete duplicate, nonconsecutive lines from a file. Ignore blank
    # lines. Trailing spaces and tabs are trimmed to humanize comparisons
    # squeeze blank lines to one
    
    norepeat() {
     sed -n -E '
      s/[ \t]+$//;
      G;
      /^(\n){2,}/d;
      /^([^\n]+).*\n\1(\n|$)/d;
      h;
      P;
      ';
    }
    
    lastrepeat() {
     sed -n -E '
      s/[ \t]+$//;
      /^$/{
       H;
       d;
      };
      G;
      # delete previous repeated line if found
      s/^([^\n]+)(.*)(\n\1(\n.*|$))/\1\2\4/;
      # after searching for previous repeat, move tested last line to end
      s/^([^\n]+)(\n)(.*)/\3\2\1/;
      $!{
       h;
       d;
      };
      # squeeze blank lines to one
      s/(\n){3,}/\n\n/g;
      s/^\n//;
      p;
     ';
    }
    

提交回复
热议问题