I have a file with 450,000+ rows of entries. Each entry is about 7 characters in length. What I want to know is the unique characters of this file.
For instance, if my f
BASH shell script version (no sed/awk):
while read -n 1 char; do echo "$char"; done < entry.txt | tr [A-Z] [a-z] | sort -u
UPDATE: Just for the heck of it, since I was bored and still thinking about this problem, here's a C++ version using set. If run time is important this would be my recommended option, since the C++ version takes slightly more than half a second to process a file with 450,000+ entries.
#include
#include
int main() {
std::set seen_chars;
std::set::const_iterator iter;
char ch;
/* ignore whitespace and case */
while ( std::cin.get(ch) ) {
if (! isspace(ch) ) {
seen_chars.insert(tolower(ch));
}
}
for( iter = seen_chars.begin(); iter != seen_chars.end(); ++iter ) {
std::cout << *iter << std::endl;
}
return 0;
}
Note that I'm ignoring whitespace and it's case insensitive as requested.
For a 450,000+ entry file (chars.txt), here's a sample run time:
[user@host]$ g++ -o unique_chars unique_chars.cpp
[user@host]$ time ./unique_chars < chars.txt
a
b
d
o
y
real 0m0.638s
user 0m0.612s
sys 0m0.017s