Find unique lines

后端 未结 11 1683
情书的邮戳
情书的邮戳 2020-12-23 10:46

How can I find the unique lines and remove all duplicates from a file? My input file is

1
1
2
3
5
5
7
7

I would like the result to be:

相关标签:
11条回答
  • 2020-12-23 11:35

    uniq -u has been driving me crazy because it did not work.

    So instead of that, if you have python (most Linux distros and servers already have it):

    Assuming you have the data file in notUnique.txt

    #Python
    #Assuming file has data on different lines
    #Otherwise fix split() accordingly.
    
    uniqueData = []
    fileData = open('notUnique.txt').read().split('\n')
    
    for i in fileData:
      if i.strip()!='':
        uniqueData.append(i)
    
    print uniqueData
    
    ###Another option (less keystrokes):
    set(open('notUnique.txt').read().split('\n'))
    

    Note that due to empty lines, the final set may contain '' or only-space strings. You can remove that later. Or just get away with copying from the terminal ;)

    #

    Just FYI, From the uniq Man page:

    "Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use 'sort -u' without 'uniq'. Also, comparisons honor the rules specified by 'LC_COLLATE'."

    One of the correct ways, to invoke with: # sort nonUnique.txt | uniq

    Example run:

    $ cat x
    3
    1
    2
    2
    2
    3
    1
    3
    
    $ uniq x
    3
    1
    2
    3
    1
    3
    
    $ uniq -u x
    3
    1
    3
    1
    3
    
    $ sort x | uniq
    1
    2
    3
    

    Spaces might be printed, so be prepared!

    0 讨论(0)
  • 2020-12-23 11:36

    uniq -u < file will do the job.

    0 讨论(0)
  • 2020-12-23 11:38

    uniq has the option you need:

       -u, --unique
              only print unique lines
    
    $ cat file.txt
    1
    1
    2
    3
    5
    5
    7
    7
    $ uniq -u file.txt
    2
    3
    
    0 讨论(0)
  • 2020-12-23 11:39

    This was the first i tried

    skilla:~# uniq -u all.sorted  
    
    76679787
    76679787 
    76794979
    76794979 
    76869286
    76869286 
    ......
    

    After doing a cat -e all.sorted

    skilla:~# cat -e all.sorted 
    $
    76679787$
    76679787 $
    76701427$
    76701427$
    76794979$
    76794979 $
    76869286$
    76869286 $
    

    Every second line has a trailing space :( After removing all trailing spaces it worked!

    thank you

    0 讨论(0)
  • 2020-12-23 11:40
    sort -d "file name" | uniq -u
    

    this worked for me for a similar one. Use this if it is not arranged. You can remove sort if it is arranged

    0 讨论(0)
提交回复
热议问题