how determining file size in term of number of characters?

会有一股神秘感。 提交于 2020-01-04 06:35:07

问题


Reading file using java and jcifs on windows. I need to determine size of file, which contains multi-byte as well as ASCII characters.

how can i achieve it efficiently OR any existing API in java?

Thanks,


回答1:


To get the character count, you'll have to read the file. By specifying the correct file encoding, you ensure that Java correctly reads each character in your file.

BufferedReader.read() returns the Unicode character read (as an int in the range 0 to 65535). So the simple way to do it would be like this:

int countCharsSimple(File f, String charsetName) throws IOException {
    BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(f), charsetName));
    int charCount = 0;
    while(reader.read() > -1) {
        charCount++;
    }
    reader.close();
    return charCount;
}

You will get faster performance using Reader.read(char[]):

int countCharsBuffer(File f, String charsetName) throws IOException {
    BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(f), charsetName));
    int charCount = 0;
    char[] cbuf = new char[1024];
    int read = 0;
    while((read = reader.read(cbuf)) > -1) {
        charCount += read;
    }
    reader.close();
    return charCount;
}

For interest, I benchmarked these two and the nio version suggested in Andrey's answer. I found the second example above (countCharsBuffer) to be the fastest.

(Note that all these examples include line separator characters in their counts.)




回答2:


No doubts, to get exact number of characters you have to read it with proper encoding. The question is how to read files efficiently. Java NIO is fastest known way to do that.

FileChannel fChannel = new FileInputStream(f).getChannel();
    byte[] barray = new byte[(int) f.length()];
    ByteBuffer bb = ByteBuffer.wrap(barray);
    fChannel.read(bb);

then

String str = new String(barray, charsetName);
str.length();

Reading into byte buffer is done with a speed near to maximum available ( for me it was like 60 Mb/sec while disk speed test gives about 70-75 Mb/sec)



来源:https://stackoverflow.com/questions/8590544/how-determining-file-size-in-term-of-number-of-characters

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!