Probability of getting a duplicate value when calling GetHashCode() on strings
I want to know the probability of getting duplicate values when calling the GetHashCode() method on string instances. For instance, according to this blog post, blair and brainlessness have the same hashcode (1758039503) on an x86 machine. Large. (Sorry Jon!) The probability of getting a hash collision among short strings is extremely large . Given a set of only ten thousand distinct short strings drawn from common words, the probability of there being at least one collision in the set is approximately 1%. If you have eighty thousand strings, the probability of there being at least one