I have a 10 character string key field in a database. I\'ve used CRC32 to hash this field but I\'m worry about duplicates. Could somebody show me the probability of collisio
In the case you cite, at least one collision is essentially guaranteed. The probability of at least one collision is about 1 - 3x10-51. The average number of collisions you would expect is about 116.
In general, the average number of collisions in k samples, each a random choice among n possible values is:
The probability of at least one collision is:
In your case, n = 232 and k = 106.
The probability of a three-way collision in your case is about 0.01. See the Birthday Problem.
Duplicate of Expected collisions for perfect 32bit crc
The answer referenced this article: http://arstechnica.com/civis/viewtopic.php?f=20&t=149670
Found the image below from: http://preshing.com/20110504/hash-collision-probabilities