Find Unique Characters in a File

前端未结

关注

 22  2326

I have a file with 450,000+ rows of entries. Each entry is about 7 characters in length. What I want to know is the unique characters of this file.

For instance, if my f

相关标签:

22条回答

时光说笑

2021-02-04 04:09
As requested, a pure shell-script "solution":
```
sed -e "s/./\0\n/g" inputfile | sort -u
```
It's not nice, it's not fast and the output is not exactly as specified, but it should work ... mostly.

For even more ridiculousness, I present the version that dumps the output on one line:
```
sed -e "s/./\0\n/g" inputfile | sort -u | while read c; do echo -n "$c" ; done
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
失恋的感觉

2021-02-04 04:15
~~Alternative solution using bash:~~
```
sed "s/./\l\0\n/g" inputfile | sort -u | grep -vc ^$
```
EDIT Sorry, I actually misread the question. The above code counts the unique characters. Just omitting the c switch at the end obviously does the trick but then, this solution has no real advantage to saua's (especially since he now uses the same sed pattern instead of explicit captures).
0 讨论(0)
发布评论:

提交评论
- 加载中...
半阙折子戏

2021-02-04 04:15

Use a set data structure. Most programming languages / standard libraries come with one flavour or another. If they don't, use a hash table (or generally, dictionary) implementation and just omit the value field. Use your characters as keys. These data structures generally filter out duplicate entries (hence the name set, from its mathematical usage: sets don't have a particular order and only unique values).

0 讨论(0)
发布评论:

提交评论
- 加载中...

醉梦人生

2021-02-04 04:18

Print unique characters (ASCII and Unicode UTF-8)

import codecs
file = codecs.open('my_file_name', encoding='utf-8')

# Runtime: O(1)
letters = set()

# Runtime: O(n^2)
for line in file:
  for character in line:
    letters.add(character)

# Runtime: O(n)
letter_str = ''.join(letters)

print(letter_str)

Save as unique.py, and run as python unique.py.

0 讨论(0)

别跟我提以往

2021-02-04 04:21

cat yourfile | 
 perl -e 'while(<>){chomp;$k{$_}++ for split(//, lc $_)}print keys %k,"\n";'

0 讨论(0)

孤独总比滥情好

2021-02-04 04:23

in c++ i would first loop through the letters in the alphabet then run a strchr() on each with the file as a string. this will tell you if that letter exists, then just add it to the list.

0 讨论(0)
发布评论:

提交评论
- 加载中...