I need to calculate a SHA-256 hash of a large file (or portion of it). My implementation works fine, but its much slower than the C++\'s CryptoPP calculation (25 Min. vs. 10
I think this difference in performance might only be platform related. Try changing the buffer size and see if there are any improvements. If not, I would go with JNI (Java Native Interface). Just call the C++ implementation from Java.
Since you apparently have a working C++ implementation which is fast, you could build a JNI bridge and use the actual C++ implementation or maybe you could try not reinventing the wheel, especially since it's a big one and use a premade library such as BouncyCastle which has been made to solve all cryptographic needs of your program.
My explanation may not solve your problem since it depends a lot on your actual runtime environment, but when I run your code on my system, the throughput is limited by disk I/O and not the hash calculation. The problem is not solved by switching to NIO, but is simply caused by the fact that you're reading the file in very small pieces (16kB). Increasing the buffer size (buff) on my system to 1MB instead of 16kB more than doubles the throughput, but with >50MB/s, I am still limited by disk speed and not able to fully load a single CPU core.
BTW: You can simplify your implementation a lot by wrapping a DigestInputStream around a FileInputStream, read through the file and get the calculated hash from the DigestInputStream instead of manually shuffling the data from a RandomAccessFile to the MessageDigest as in your code.
I did a few performance tests with older Java versions and there seem to be a relevant difference between Java 5 and Java 6 here. I'm not sure though if the SHA implementation is optimized or if the VM is executing the code much faster. The throughputs I get with the different Java versions (1MB buffer) are:
I was a little bit curious on the impact of the assembler part in the CryptoPP SHA implementation, as the benchmarks results indicate that the SHA-256 algorithm only requires 15.8 CPU cycles/byte on an Opteron. I was unfortunately not able to build CryptoPP with gcc on cygwin (the build succeeded, but the generated exe failed immediately), but building a performance benchmark with VS2005 (default release configuration) with and without assembler support in CryptoPP and comparing to the Java SHA implementation on an in-memory buffer, leaving out any disk I/O, I get the following results on a 2.5GHz Phenom:
Both benchmarks compute the SHA hash of a 4GB empty byte array, iterating over it in chunks of 1MB, which are passed to MessageDigest#update (Java) or CryptoPP's SHA256.Update function (C++).
I was able to build and benchmark CryptoPP with gcc 4.4.1 (-O3) in a virtual machine running Linux and got only appr. half the throughput compared to the results from the VS exe. I am not sure how much of the difference is contributed to the virtual machine and how much is caused by VS usually producing better code than gcc, but I have no way to get any more exact results from gcc right now.
I suggest you use a profiler like JProfiler or the one integrated in Netbeans (free) to find out, where the time is actually spent and concentrate on that part.
Just a wild guess - not sure if it will help - but have you tried the Server VM? Try starting the app with java -server
and see if that helps you. The server VM is more aggressive compiling Java code to native than the default client VM is.
The MAIN reason why your code is so slow is because you use a RandomAccessFile which always has been quite slow performance-wise. I suggest using a "BufferedInputStream" so that you may benefit from all the power of the OS-level caching for disk-i/o.
The code should look something like:
public static byte [] hash(MessageDigest digest, BufferedInputStream in, int bufferSize) throws IOException {
byte [] buffer = new byte[bufferSize];
int sizeRead = -1;
while ((sizeRead = in.read(buffer)) != -1) {
digest.update(buffer, 0, sizeRead);
}
in.close();
byte [] hash = null;
hash = new byte[digest.getDigestLength()];
hash = digest.digest();
return hash;
}
Perhaps the first thing today is work out where you are spending the most time? Can you run it through a profiler and see where the most time is being spent.
Possible improvements: