Can Java's hashCode produce same value for different strings?

前端 未结 12 822
情歌与酒
情歌与酒 2020-11-28 08:58

Is it possible to have same hashcode for different strings using java\'s hashcode function?or if it is possible then what is the % of its possibility?

相关标签:
12条回答
  • 2020-11-28 09:31

    if it is possible then what is the % of its possibility?

    That is not a particularly meaningful question.

    However, unless there is some systemic bias in the String::hashcode function or the way that you are generating the String objects, the probability that any two different (non-equal) String objects will have the same hash code will be 1 in 232.

    This assumes that the Strings are chosen randomly from the set of all possible String values. If you restrict the set in various ways, the probability will vary from the above number. (For instance, the existence of the "FB" / "Ea" collision means that the probability of a collision in the set of all 2 letter strings is higher than the norm.)


    Another thing to note is that the chance of 232 different strings chosen at random (from a much larger unbiased set of strings) having no hash collisions is vanishingly small. To understand why, read the Wikipedia page on the Birthday Paradox.

    In reality, the only way you are going to get no hash collisions in a set of 232 different strings is if you select or generate the strings. Even forming the set by selecting randomly generated strings is going to be computationally expensive. To produce such a set efficiently, you would need to exploit the properties of the String::hashCode algorithm, which (fortunately) is specified.

    0 讨论(0)
  • 2020-11-28 09:31

    Yes this is possible, because one of the contract between equals() & hashCode() method of Object class is.......... If two object are not equal according to equals() method then there is no guaranty that their hashCode will be same, the hashCode may/may not be equal. i.e, if obj1.equals(obj2) return false then obj1.hashCode()==obj2.hashCode() may/may not return true. Example:

        String str1 = "FB";
        String str2 = "Ea";
        System.out.println(str1.equals(str2));// false
        System.out.println(str1.hashCode() == str2.hashCode()); // true
    
    0 讨论(0)
  • 2020-11-28 09:36

    Yes, it is entirely possible. The probability of a string (or some other object type -- just assuming you'll be using strings in this example) having the same hashcode as some other string in a collection, depends on the size of that collection (assuming that all strings in that collection are unique). The probabilities are distributed as follows:

    • With a set of size ~9,000, you'll have a 1% chance of two strings colliding with a hash in the set
    • With a set of size ~30,000, you'll have a 10% chance of two strings colliding with a hash in the set
    • With a set of size ~77,000, you'll have a 50% chance of two strings colliding with a hash in the set

    The assumptions made, are:

    • The hashCode function has no bias
    • Each string in the aforementioned set is unique

    This site explains it clearly: http://eclipsesource.com/blogs/2012/09/04/the-3-things-you-should-know-about-hashcode/ (Look at "the second thing you should know")

    0 讨论(0)
  • 2020-11-28 09:43

    A Java hash code is 32bits. The number of possible strings it hashes is infinite.

    So yes, there will be collisions. The percentage is meaningless - there is an infinite number of items (strings) and a finite number of possible hashes.

    0 讨论(0)
  • 2020-11-28 09:45
    "tensada".hashCode()
    "friabili".hashCode());
    

    The java hash function return equal values here.

    0 讨论(0)
  • 2020-11-28 09:50

    This wouldn't directly answer your question, but I hope it helps.

    The below is from the source code of java.lang.String.

    /**
     * Returns a hash code for this string. The hash code for a
     * <code>String</code> object is computed as
     * <blockquote><pre>
     * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
     * </pre></blockquote>
     * using <code>int</code> arithmetic, where <code>s[i]</code> is the
     * <i>i</i>th character of the string, <code>n</code> is the length of
     * the string, and <code>^</code> indicates exponentiation.
     * (The hash value of the empty string is zero.)
     *
     * @return  a hash code value for this object.
     */
    public int hashCode() {
        int h = hash;
        int len = count;
        if (h == 0 && len > 0) {
        int off = offset;
        char val[] = value;
    
            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
        }
        return h;
    }
    
    0 讨论(0)
提交回复
热议问题