问题
In the browser, I read in a file using the JS FileReader().readAsBinaryString(). Using the CryptoJS library I can MD5 hash the data.
This works fine but I do not know how to handle large files. E.g. Just reading a 2GiB file crashes the browser window. I can slice blobs from the file data and hash that as I go but wouldn't this prevent anyone else from verifying the same hash without following the same steps as me?
Is there a way to get the md5 hash of a large file in this circumstance? How would you calc the md5 hash of a 1TB file, for example? Do I need to read the file in as a stream?
First time cutting my teeth on this one and I'm not sure how to do it.
This resides in an angular directive, hence the scope.
var reader = new FileReader();
reader.onload = function (loadEvent) {
scope.$apply(function () {
scope.files = changeEvent.target.files;
scope.fileread = loadEvent.target.result;
scope.md5Data = CryptoJS.MD5(scope.fileread).toString();
});
}
// First ten megs of the file
reader.readAsBinaryString((changeEvent.target.files[0]).slice(0, 10 * 1024 * 1024));
回答1:
I can slice blobs from the file data and hash that as I go but wouldn't this prevent anyone else from verifying the same hash without following the same steps as me?
Yes, therefore this is exactly what the MD5 algorithm provides in its contract:
- you have a file
- the file is padded by adding a single '1' and mutliple '0', so the file is dividable by 512.
- each turn computes the md5 hash of one slice of 512 bytes of the file and combines it with the previous result.
So you will not need to repeat these steps and make sure another user does the same.
Since MD5 is computed in blocks, streaming is possible, as you can read here (although done with the crypt module of nodejs which is a modularized port of googlecode project crypto-js.):
http://www.hacksparrow.com/how-to-generate-md5-sha1-sha512-sha256-checksum-hashes-in-node-js.html
回答2:
Use spark-md5 and Q
Since none of the other answers provided a full snippet, here's how you would calculage the MD5 Hash of a large file
function calculateMD5Hash(file, bufferSize) {
var def = Q.defer();
var fileReader = new FileReader();
var fileSlicer = File.prototype.slice || File.prototype.mozSlice || File.prototype.webkitSlice;
var hashAlgorithm = new SparkMD5();
var totalParts = Math.ceil(file.size / bufferSize);
var currentPart = 0;
var startTime = new Date().getTime();
fileReader.onload = function(e) {
currentPart += 1;
def.notify({
currentPart: currentPart,
totalParts: totalParts
});
var buffer = e.target.result;
hashAlgorithm.appendBinary(buffer);
if (currentPart < totalParts) {
processNextPart();
return;
}
def.resolve({
hashResult: hashAlgorithm.end(),
duration: new Date().getTime() - startTime
});
};
fileReader.onerror = function(e) {
def.reject(e);
};
function processNextPart() {
var start = currentPart * bufferSize;
var end = Math.min(start + bufferSize, file.size);
fileReader.readAsBinaryString(fileSlicer.call(file, start, end));
}
processNextPart();
return def.promise;
}
function calculate() {
var input = document.getElementById('file');
if (!input.files.length) {
return;
}
var file = input.files[0];
var bufferSize = Math.pow(1024, 2) * 10; // 10MB
calculateMD5Hash(file, bufferSize).then(
function(result) {
// Success
console.log(result);
},
function(err) {
// There was an error,
},
function(progress) {
// We get notified of the progress as it is executed
console.log(progress.currentPart, 'of', progress.totalParts, 'Total bytes:', progress.currentPart * bufferSize, 'of', progress.totalParts * bufferSize);
});
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/q.js/1.4.1/q.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/spark-md5/2.0.2/spark-md5.min.js"></script>
<div>
<input type="file" id="file"/>
<input type="button" onclick="calculate();" value="Calculate" class="btn primary" />
</div>
回答3:
You may want to check the paragraph progressive hashing on the CryptoJS site.
The example:
var sha256 = CryptoJS.algo.SHA256.create();
sha256.update("Message Part 1");
sha256.update("Message Part 2");
sha256.update("Message Part 3");
var hash = sha256.finalize();
replace SHA256
with MD5
and presto (rename the variable as well, I'll let you chose a good name).
回答4:
use SparkMD5 https://github.com/satazor/SparkMD5
var spark = new SparkMD5();
spark.append('Hi');
spark.append('there');
var hexHash = spark.end();
and it has a file-slice example
来源:https://stackoverflow.com/questions/30420409/md5-hash-a-large-file-incrementally