Overriding hashCode() in Java

问题

I created a class "Book":

public class Book {

public static int idCount = 1;

private int id;
private String title;
private String author;
private String publisher;
private int yearOfPublication;
private int numOfPages;
private Cover cover;

...

}

And then i need to override the hashCode() and equals() methods.

@Override
public int hashCode() {

    int result = id; // !!!

    result = 31 * result + (title != null ? title.hashCode() : 0);
    result = 31 * result + (author != null ? author.hashCode() : 0);
    result = 31 * result + (publisher != null ? publisher.hashCode() : 0);
    result = 31 * result + yearOfPublication;
    result = 31 * result + numOfPages;
    result = 31 * result + (cover != null ? cover.hashCode() : 0);

    return result;
}

It's no problem with equals(). I just wondering about one thing in hashCode() method.

Note: IntelliJ IDEA generated that hashCode() method.

So, is it OK to set the result variable to id, or should i use some prime number?

What is the better choice here?

Thanks!

回答1:

Note that only the initial value of the result is set to id, not the final one. The final value is calculated by combining that initial value with hash codes of other parts of the object, multiplied by a power of a small prime number (i.e. 31). Using id rather than an arbitrary prime is definitely right in this context.

In general, there is no advantage to hash code being prime (it's the number of hash buckets that needs to be prime). Using an int as its own hash code (in your case, that's id and numOfPages) is a valid approach.

回答2:

It helps to know what the hashCode is used for. It's supposed to help you map a theoretically infinite set of objects to fitting in a small number of "bins", with each bin having a number, and each object saying which bin it wants to go in based on its hashCode. The question is not whether it's okay to do one thing or another, but whether what you want to do matches what the hashCode function is for.

As per http://docs.oracle.com/javase/6/docs/api/java/lang/Object.html#hashCode(), it's not about the number you return, it's about how it behaves for different objects of the same class.

If the object doesn't change, the hashCode must be the same value every time you call the hashCode() function.
Two objects that are equal according to .equals, must have the same hashCode.
Two objects that are not equal may have the same hashCode. (if this wasn't the case, there would be no point in using the hashCode at all, because every object already has a unique object pointer)

If you're reimplementing the hashCode function, the most important thing is to either rely on a tool to generate it for you, or to use code you understand that obeys those rules. The basic Java hashCode function uses an incredibly well-researched, seemingly simple bit of code for String hashing, so the code you see is based on turning everything into Strings and falling back to that.

If you don't know why that works, don't touch it. Just rely on it working and move on. That 31 is ridiculously important and ensures an even hashing distribution. See Why does Java's hashCode() in String use 31 as a multiplier? for the why on that one.

However, this might also be way more than you need. You could use id, but then you're basically negating the reason to use a hashCode (because now every object will want to be in a bin on its own, turning any hashed collection into a flat array. Kind of silly).

If you know the distribution of your id values, there are far easier hashCodes to come up with. Say you know they are always between 0 and Interger.MAX_VALUE, and you know there are never any gaps between ids, you could simply generate a hashCode like

final int modulus = Intereger.MAX_VALUE / 255;
int hashCode() {
  return this.id % modulus;
}

now, you have a hashCode optimised for 255 bins, fulfilling the necessary requirements for an acceptable hashCode function.

回答3:

Note : In my answer I am assuming that you know how hash code is meant to be used. The following just talks about any potential optimization using a non-zero constant for the initial value of result may produce.

If id is rarely 0 then it's fine to use it. However, if it's 0 frequently you should use some constant instead (just using 1 should be fine). The reason you want for it to be non-zero is so that the 31 * result part always adds some value to the hash. That way say if object A has all fields null or 0 except for yearOfPublication = 1 and object B has all fields null or 0 except for numOfPages = 1 the hash codes will be:

A.hashCode() => initialValue * 31 ^ 4 + 1
B.hashCode() => initialValue * 31 ^ 5 + 1

As you can see if initialValue is 0 then both hash codes are the same, however if it's not 0 then they will be different. It is preferable for them to be different so as to reduce collisions in data structures that use the hash code like HashMap.

That said, in your example of the Book class it is likely that id will never be 0. In fact, if id uniquely identifies the Book then you can have the hashCode() method just return the id.

来源：https://stackoverflow.com/questions/18965374/overriding-hashcode-in-java

标签

java

hash

overriding