I have a system with roughly a 100 million documents, and I\'d like to keep track of their modifications between mirrors. In order to exchange information about modifications ef
How about
hash = X(documents, 0, function(document) { ... })
where X is an aggregate XOR (javascript-y pseudocode follows):
function X(documents, x, f)
{
for each (var document in documents)
{
x ^= f(document);
}
return x;
}
and f() is a hash of individual document information? (whether timestamp or filename or ID or whatever)
The use of XOR would allow you to "subtract" out documents, but using a hash on a per-document basis allows you to preserve a hash-like quality of detecting small changes.