Find Unique Characters in a File

前端未结

关注

 22  2240

I have a file with 450,000+ rows of entries. Each entry is about 7 characters in length. What I want to know is the unique characters of this file.

For instance, if my f

相关标签:

22条回答

半阙折子戏

2021-02-04 03:59

Python w/sets (quick and dirty)

s = open("data.txt", "r").read()
print "Unique Characters: {%s}" % ''.join(set(s))

Python w/sets (with nicer output)

import re

text = open("data.txt", "r").read().lower()
unique = re.sub('\W, '', ''.join(set(text))) # Ignore non-alphanumeric

print "Unique Characters: {%s}" % unique

0 讨论(0)

粉色の甜心

2021-02-04 04:01
A very fast solution would be to make a small C program that reads its standard input, does the aggregation and spits out the result.

Why the arbitrary limitation that you need a "script" that does it?

What exactly is a script anyway?

Would Python do?

If so, then this is one solution:
```
import sys;

s = set([]);
while True:
    line = sys.stdin.readline();
    if not line:
        break;
    line = line.rstrip();
    for c in line.lower():
        s.add(c);

print("".join(sorted(s)));
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

感情败类

2021-02-04 04:02

A C solution. Admittedly it is not the fastest to code solution in the world. But since it is already coded and can be cut and pasted, I think it counts as "fast to implement" for the poster :) I didn't actually see any C solutions so I wanted to post one for the pure sadistic pleasure :)

#include<stdio.h>

#define CHARSINSET 256
#define FILENAME "location.txt"

char buf[CHARSINSET + 1];

char *getUniqueCharacters(int *charactersInFile) {
    int x;
    char *bufptr = buf;
    for (x = 0; x< CHARSINSET;x++) {
        if (charactersInFile[x] > 0)
            *bufptr++ = (char)x;
    }
    bufptr = '\0';
    return buf;
}

int main() {
    FILE *fp;
    char c;
    int *charactersInFile = calloc(sizeof(int), CHARSINSET);
    if (NULL == (fp = fopen(FILENAME, "rt"))) {
        printf ("File not found.\n");
        return 1;
    }
    while(1) {
        c = getc(fp);
        if (c == EOF) {
            break;
        }
        if (c != '\n' && c != '\r')
            charactersInFile[c]++;
    }

    fclose(fp);
    printf("Unique characters: {%s}\n", getUniqueCharacters(charactersInFile));
    return 0;
}

0 讨论(0)

北恋

2021-02-04 04:02

Here's a PowerShell example:

gc file.txt | select -Skip 2 | % { $_.ToCharArray() } | sort -CaseSensitive -Unique

which produces:

D
Y
a
b
o

I like that it's easy to read.

EDIT: Here's a faster version:

$letters = @{} ; gc file.txt | select -Skip 2 | % { $_.ToCharArray() } | % { $letters[$_] = $true } ; $letters.Keys

0 讨论(0)

我寻月下人不归

2021-02-04 04:05

Try this file with JSDB Javascript (includes the javascript engine in the Firefox browser):

var seenAlreadyMap={};
var seenAlreadyArray=[];
while (!system.stdin.eof)
{
  var L = system.stdin.readLine();
  for (var i = L.length; i-- > 0; )
  {
    var c = L[i].toLowerCase();
    if (!(c in seenAlreadyMap))
    {
      seenAlreadyMap[c] = true;
      seenAlreadyArray.push(c);
    }
  }
}
system.stdout.writeln(seenAlreadyArray.sort().join(''));

0 讨论(0)

长情又很酷

2021-02-04 04:05

Python without using a set.

file = open('location', 'r')

letters = []
for line in file:
    for character in line:
        if character not in letters:
            letters.append(character)

print(letters)

0 讨论(0)

1 2 3 4 下一页