What checksum algorithm should I use?

前端 未结 2 1111
盖世英雄少女心
盖世英雄少女心 2020-12-23 13:49

I\'m building a system which needs to be able to find if blobs of bytes have been updated. Rather than storing the whole blob (they can be up to 5MBs), I\'

相关标签:
2条回答
  • 2020-12-23 14:00

    I suggest you have a look to this SO page, CRC vs MD5/SHA1.
    Speed and collisions are discussed in this other thread.
    And as always Wikipedia is your friend.

    If I had to choose, there is an important question to answer: do you want that in any case there is no collision - or, at least, that the probability is so low that it is close to the chance that the Moon collides with Earth within the next 5 minutes?

    If yes, choose the SHA family.
    In your case I would change the way the update check is being done.
    For instance, an incremental number could be associated with the blob, and be sent instead of the hash, the request for update would be required if the number is different on the other side. The collision probability in this case goes from ~10^-18 to ~0 (basically 0 + bug probability )...

    Edit following comments

    Found this algorithm, Alder-32, which is good for long messages (MB) with a CRC of 32 bits, i.e. about ~1/10^9 (MD5 is 128 bits long).
    It is fast to calculate.
    Adler-32. There is some come sample (link) at the bottom.

    0 讨论(0)
  • 2020-12-23 14:03

    Blake2 is the fastest hash function you can use and that is mainly adopted:

    BLAKE2 is not only faster than the other good hash functions, it is even faster than MD5 or SHA-1 Source

    Winner of SHA-3 contest was Keccak algorithm but is not yet has a popular implementation is not adopted by default in GNU/Linux distributions. Instead, Blake2 that was a SHA-3 contest candidate is faster than Keccak and is part of GNU coreutils. So on you GNU/Linux distribution you can use b2sum to use Blake2 hash algorithm.

    0 讨论(0)
提交回复
热议问题