Best implementation for hashCode method for a collection

后端 未结 20 3087
难免孤独
难免孤独 2020-11-22 01:39

How do we decide on the best implementation of hashCode() method for a collection (assuming that equals method has been overridden correctly) ?

相关标签:
20条回答
  • 2020-11-22 02:23

    For a simple class it is often easiest to implement hashCode() based on the class fields which are checked by the equals() implementation.

    public class Zam {
        private String foo;
        private String bar;
        private String somethingElse;
    
        public boolean equals(Object obj) {
            if (this == obj) {
                return true;
            }
    
            if (obj == null) {
                return false;
            }
    
            if (getClass() != obj.getClass()) {
                return false;
            }
    
            Zam otherObj = (Zam)obj;
    
            if ((getFoo() == null && otherObj.getFoo() == null) || (getFoo() != null && getFoo().equals(otherObj.getFoo()))) {
                if ((getBar() == null && otherObj. getBar() == null) || (getBar() != null && getBar().equals(otherObj. getBar()))) {
                    return true;
                }
            }
    
            return false;
        }
    
        public int hashCode() {
            return (getFoo() + getBar()).hashCode();
        }
    
        public String getFoo() {
            return foo;
        }
    
        public String getBar() {
            return bar;
        }
    }
    

    The most important thing is to keep hashCode() and equals() consistent: if equals() returns true for two objects, then hashCode() should return the same value. If equals() returns false, then hashCode() should return different values.

    0 讨论(0)
  • 2020-11-22 02:27

    Although this is linked to Android documentation (Wayback Machine) and My own code on Github, it will work for Java in general. My answer is an extension of dmeister's Answer with just code that is much easier to read and understand.

    @Override 
    public int hashCode() {
    
        // Start with a non-zero constant. Prime is preferred
        int result = 17;
    
        // Include a hash for each field.
    
        // Primatives
    
        result = 31 * result + (booleanField ? 1 : 0);                   // 1 bit   » 32-bit
    
        result = 31 * result + byteField;                                // 8 bits  » 32-bit 
        result = 31 * result + charField;                                // 16 bits » 32-bit
        result = 31 * result + shortField;                               // 16 bits » 32-bit
        result = 31 * result + intField;                                 // 32 bits » 32-bit
    
        result = 31 * result + (int)(longField ^ (longField >>> 32));    // 64 bits » 32-bit
    
        result = 31 * result + Float.floatToIntBits(floatField);         // 32 bits » 32-bit
    
        long doubleFieldBits = Double.doubleToLongBits(doubleField);     // 64 bits (double) » 64-bit (long) » 32-bit (int)
        result = 31 * result + (int)(doubleFieldBits ^ (doubleFieldBits >>> 32));
    
        // Objects
    
        result = 31 * result + Arrays.hashCode(arrayField);              // var bits » 32-bit
    
        result = 31 * result + referenceField.hashCode();                // var bits » 32-bit (non-nullable)   
        result = 31 * result +                                           // var bits » 32-bit (nullable)   
            (nullableReferenceField == null
                ? 0
                : nullableReferenceField.hashCode());
    
        return result;
    
    }
    

    EDIT

    Typically, when you override hashcode(...), you also want to override equals(...). So for those that will or has already implemented equals, here is a good reference from my Github...

    @Override
    public boolean equals(Object o) {
    
        // Optimization (not required).
        if (this == o) {
            return true;
        }
    
        // Return false if the other object has the wrong type, interface, or is null.
        if (!(o instanceof MyType)) {
            return false;
        }
    
        MyType lhs = (MyType) o; // lhs means "left hand side"
    
                // Primitive fields
        return     booleanField == lhs.booleanField
                && byteField    == lhs.byteField
                && charField    == lhs.charField
                && shortField   == lhs.shortField
                && intField     == lhs.intField
                && longField    == lhs.longField
                && floatField   == lhs.floatField
                && doubleField  == lhs.doubleField
    
                // Arrays
    
                && Arrays.equals(arrayField, lhs.arrayField)
    
                // Objects
    
                && referenceField.equals(lhs.referenceField)
                && (nullableReferenceField == null
                            ? lhs.nullableReferenceField == null
                            : nullableReferenceField.equals(lhs.nullableReferenceField));
    }
    
    0 讨论(0)
  • 2020-11-22 02:28

    As you specifically asked for collections, I'd like to add an aspect that the other answers haven't mentioned yet: A HashMap doesn't expect their keys to change their hashcode once they are added to the collection. Would defeat the whole purpose...

    0 讨论(0)
  • 2020-11-22 02:29

    Here is another JDK 1.7+ approach demonstration with superclass logics accounted. I see it as pretty convinient with Object class hashCode() accounted, pure JDK dependency and no extra manual work. Please note Objects.hash() is null tolerant.

    I have not include any equals() implementation but in reality you will of course need it.

    import java.util.Objects;
    
    public class Demo {
    
        public static class A {
    
            private final String param1;
    
            public A(final String param1) {
                this.param1 = param1;
            }
    
            @Override
            public int hashCode() {
                return Objects.hash(
                    super.hashCode(),
                    this.param1);
            }
    
        }
    
        public static class B extends A {
    
            private final String param2;
            private final String param3;
    
            public B(
                final String param1,
                final String param2,
                final String param3) {
    
                super(param1);
                this.param2 = param2;
                this.param3 = param3;
            }
    
            @Override
            public final int hashCode() {
                return Objects.hash(
                    super.hashCode(),
                    this.param2,
                    this.param3);
            }
        }
    
        public static void main(String [] args) {
    
            A a = new A("A");
            B b = new B("A", "B", "C");
    
            System.out.println("A: " + a.hashCode());
            System.out.println("B: " + b.hashCode());
        }
    
    }
    
    0 讨论(0)
  • 2020-11-22 02:32

    The best implementation? That is a hard question because it depends on the usage pattern.

    A for nearly all cases reasonable good implementation was proposed in Josh Bloch's Effective Java in Item 8 (second edition). The best thing is to look it up there because the author explains there why the approach is good.

    A short version

    1. Create a int result and assign a non-zero value.

    2. For every field f tested in the equals() method, calculate a hash code c by:

      • If the field f is a boolean: calculate (f ? 0 : 1);
      • If the field f is a byte, char, short or int: calculate (int)f;
      • If the field f is a long: calculate (int)(f ^ (f >>> 32));
      • If the field f is a float: calculate Float.floatToIntBits(f);
      • If the field f is a double: calculate Double.doubleToLongBits(f) and handle the return value like every long value;
      • If the field f is an object: Use the result of the hashCode() method or 0 if f == null;
      • If the field f is an array: see every field as separate element and calculate the hash value in a recursive fashion and combine the values as described next.
    3. Combine the hash value c with result:

      result = 37 * result + c
      
    4. Return result

    This should result in a proper distribution of hash values for most use situations.

    0 讨论(0)
  • 2020-11-22 02:34

    The standard implementation is weak and using it leads to unnecessary collisions. Imagine a

    class ListPair {
        List<Integer> first;
        List<Integer> second;
    
        ListPair(List<Integer> first, List<Integer> second) {
            this.first = first;
            this.second = second;
        }
    
        public int hashCode() {
            return Objects.hashCode(first, second);
        }
    
        ...
    }
    

    Now,

    new ListPair(List.of(a), List.of(b, c))
    

    and

    new ListPair(List.of(b), List.of(a, c))
    

    have the same hashCode, namely 31*(a+b) + c as the multiplier used for List.hashCode gets reused here. Obviously, collisions are unavoidable, but producing needless collisions is just... needless.

    There's nothing substantially smart about using 31. The multiplier must be odd in order to avoid losing information (any even multiplier loses at least the most significant bit, multiples of four lose two, etc.). Any odd multiplier is usable. Small multipliers may lead to faster computation (the JIT can use shifts and additions), but given that multiplication has latency of only three cycles on modern Intel/AMD, this hardly matters. Small multipliers also leads to more collision for small inputs, which may be a problem sometimes.

    Using a prime is pointless as primes have no meaning in the ring Z/(2**32).

    So, I'd recommend using a randomly chosen big odd number (feel free to take a prime). As i86/amd64 CPUs can use a shorter instruction for operands fitting in a single signed byte, there is a tiny speed advantage for multipliers like 109. For minimizing collisions, take something like 0x58a54cf5.

    Using different multipliers in different places is helpful, but probably not enough to justify the additional work.

    0 讨论(0)
提交回复
热议问题