I need to calculate checksums of quite large files (gigabytes). This can be accomplished using the following method:
private byte[] calcHash(string file
Seems you can to use TransformBlock
/ TransformFinalBlock
, as shown in this sample: Displaying progress updates when hashing large files
Hash algorithms are expected to handle this situation and are typically implemented with 3 functions:
hash_init()
- Called to allocate resources and begin the hash.
hash_update()
- Called with new data as it arrives.
hash_final()
- Complete the calculation and free resources.
Look at http://www.openssl.org/docs/crypto/md5.html or http://www.openssl.org/docs/crypto/sha.html for good, standard examples in C; I'm sure there are similar libraries for your platform.
You use the TransformBlock
and TransformFinalBlock
methods to process the data in chunks.
// Init
MD5 md5 = MD5.Create();
int offset = 0;
// For each block:
offset += md5.TransformBlock(block, 0, block.Length, block, 0);
// For last block:
md5.TransformFinalBlock(block, 0, block.Length);
// Get the has code
byte[] hash = md5.Hash;
Note: It works (at least with the MD5 provider) to send all blocks to TransformBlock
and then send an empty block to TransformFinalBlock
to finalise the process.
I like the answer above but for the sake of completeness, and being a more general solution, refer to the CryptoStream
class. If you are already handling streams, it is easy to wrap your stream in a CryptoStream
, passing a HashAlgorithm
as the ICryptoTransform
parameter.
var file = new FileStream("foo.txt", FileMode.Open, FileAccess.Write);
var md5 = MD5.Create();
var cs = new CryptoStream(file, md5, CryptoStreamMode.Write);
while (notDoneYet)
{
buffer = Get32MB();
cs.Write(buffer, 0, buffer.Length);
}
System.Console.WriteLine(BitConverter.ToString(md5.Hash));
You might have to close the stream before getting the hash (so the HashAlgorithm
knows it's done).
I've just had to do something similar, but wanted to read the file asynchronously. It's using TransformBlock and TransformFinalBlock and is giving me answers consistent with Azure, so I think it is correct!
private static async Task<string> CalculateMD5Async(string fullFileName)
{
var block = ArrayPool<byte>.Shared.Rent(8192);
try
{
using (var md5 = MD5.Create())
{
using (var stream = new FileStream(fullFileName, FileMode.Open, FileAccess.Read, FileShare.Read, 8192, true))
{
int length;
while ((length = await stream.ReadAsync(block, 0, block.Length).ConfigureAwait(false)) > 0)
{
md5.TransformBlock(block, 0, length, null, 0);
}
md5.TransformFinalBlock(block, 0, 0);
}
var hash = md5.Hash;
return Convert.ToBase64String(hash);
}
}
finally
{
ArrayPool<byte>.Shared.Return(block);
}
}