Parallel hash computing via multiple TransformBlocks results in a disarray

前端 未结 1 1289
野性不改
野性不改 2021-01-22 07:49

I\'m trying to compute hashes for a whole directory, in order to monitor changes later. It\'s relatively easy. However, if there are big files, the computing takes too much time

相关标签:
1条回答
  • 2021-01-22 08:41

    Generally you cannot use cryptographic objects within multi-threaded code. The problem with hash methods is that they are fully linear - each block of hashing depends on the current state, and the state is calculated using all the previous blocks. So basically, you cannot do this for MD5.

    There is another process that can be used, and it is called a hash tree or Merkle tree. Basically you decide on a block size and calculate the hashes for the blocks. These hashes are put together and hashed again. If you have a very large number of hashes you may actually create a tree as described in the Wikipedia article linked to earlier. Of course the resulting hash is different from just MD5 and depends on the configuration parameters of the hash tree.

    Note that MD5 has been broken. You should be using SHA-256 or SHA-512/xxx (faster on 64 bit processors) instead. Also note that often the IO speed is more of an obstruction than the speed of the hash algorithm, negating any speed advantages of hash trees. If you have many files, you could also parallelize the hashing on file level.

    0 讨论(0)
提交回复
热议问题