Find Unique Characters in a File

前端 未结 22 2337
耶瑟儿~
耶瑟儿~ 2021-02-04 03:30

I have a file with 450,000+ rows of entries. Each entry is about 7 characters in length. What I want to know is the unique characters of this file.

For instance, if my f

22条回答
  •  失恋的感觉
    2021-02-04 04:25

    BASH shell script version (no sed/awk):

    while read -n 1 char; do echo "$char"; done < entry.txt | tr [A-Z] [a-z] |  sort -u
    

    UPDATE: Just for the heck of it, since I was bored and still thinking about this problem, here's a C++ version using set. If run time is important this would be my recommended option, since the C++ version takes slightly more than half a second to process a file with 450,000+ entries.

    #include 
    #include 
    
    int main() {
        std::set seen_chars;
        std::set::const_iterator iter;
        char ch;
    
        /* ignore whitespace and case */
        while ( std::cin.get(ch) ) {
            if (! isspace(ch) ) {
                seen_chars.insert(tolower(ch));
            }
        }
    
        for( iter = seen_chars.begin(); iter != seen_chars.end(); ++iter ) {
            std::cout << *iter << std::endl;
        }
    
        return 0;
    }
    

    Note that I'm ignoring whitespace and it's case insensitive as requested.

    For a 450,000+ entry file (chars.txt), here's a sample run time:

    [user@host]$ g++ -o unique_chars unique_chars.cpp 
    [user@host]$ time ./unique_chars < chars.txt
    a
    b
    d
    o
    y
    
    real    0m0.638s
    user    0m0.612s
    sys     0m0.017s
    

提交回复
热议问题