Find Unique Characters in a File

前端 未结 22 2244
耶瑟儿~
耶瑟儿~ 2021-02-04 03:30

I have a file with 450,000+ rows of entries. Each entry is about 7 characters in length. What I want to know is the unique characters of this file.

For instance, if my f

相关标签:
22条回答
  • 2021-02-04 04:09

    As requested, a pure shell-script "solution":

    sed -e "s/./\0\n/g" inputfile | sort -u
    

    It's not nice, it's not fast and the output is not exactly as specified, but it should work ... mostly.

    For even more ridiculousness, I present the version that dumps the output on one line:

    sed -e "s/./\0\n/g" inputfile | sort -u | while read c; do echo -n "$c" ; done
    
    0 讨论(0)
  • 2021-02-04 04:15

    Alternative solution using bash:

    sed "s/./\l\0\n/g" inputfile | sort -u | grep -vc ^$
    

    EDIT Sorry, I actually misread the question. The above code counts the unique characters. Just omitting the c switch at the end obviously does the trick but then, this solution has no real advantage to saua's (especially since he now uses the same sed pattern instead of explicit captures).

    0 讨论(0)
  • 2021-02-04 04:15

    Use a set data structure. Most programming languages / standard libraries come with one flavour or another. If they don't, use a hash table (or generally, dictionary) implementation and just omit the value field. Use your characters as keys. These data structures generally filter out duplicate entries (hence the name set, from its mathematical usage: sets don't have a particular order and only unique values).

    0 讨论(0)
  • 2021-02-04 04:18

    Print unique characters (ASCII and Unicode UTF-8)

    import codecs
    file = codecs.open('my_file_name', encoding='utf-8')
    
    # Runtime: O(1)
    letters = set()
    
    # Runtime: O(n^2)
    for line in file:
      for character in line:
        letters.add(character)
    
    # Runtime: O(n)
    letter_str = ''.join(letters)
    
    print(letter_str)
    

    Save as unique.py, and run as python unique.py.

    0 讨论(0)
  • 2021-02-04 04:21
    cat yourfile | 
     perl -e 'while(<>){chomp;$k{$_}++ for split(//, lc $_)}print keys %k,"\n";'
    
    0 讨论(0)
  • 2021-02-04 04:23

    in c++ i would first loop through the letters in the alphabet then run a strchr() on each with the file as a string. this will tell you if that letter exists, then just add it to the list.

    0 讨论(0)
提交回复
热议问题