I have a 2 GB file (iputfile.txt
) in which every line in the file is a word, just like:
apple
red
beautiful
smell
spark
input
You aren't comparing the same thing. The Java program reads lines, depening on the newline, while the C++ program reads white space delimited "words", which is a little extra work.
Try istream::getline
.
Later
You might also try and do an elementary read operation to read a byte array and scan this for newlines.
Even later
On my old Linux notebook, jdk1.7.0_21 and don't-tell-me-it's-old 4.3.3 take about the same time, comparing with C++ getline. (We have established that reading words is slower.) There isn't much difference between -O0 and -O2, which doesn't surprise me, given the simplicity of the code in the loop.
Last note As I suggested, fin.read(buffer,LEN) with LEN = 1MB and using memchr to scan for '\n' results in another speed improvement of about 20%, which makes C (there isn't any C++ left by now) faster than Java.