Why not allow an external interface to provide hashCode/equals for a HashMap?

前端 未结 9 1732
闹比i
闹比i 2020-12-17 15:28

With a TreeMap it\'s trivial to provide a custom Comparator, thus overriding the semantics provided by Comparable objects added to the

相关标签:
9条回答
  • 2020-12-17 15:49

    This is an interesting idea, but it's absolutely horrendous for performance. The reason for this is quite fundamental to the idea of a hashtable: the ordering cannot be relied upon. Hashtables are very fast (constant time) because of the way in which they index elements in the table: by computing a pseudo-unique integer hash for that element and accessing that location in an array. It's literally computing a location in memory and directly storing the element.

    This contrasts with a balanced binary search tree (TreeMap) which must start at the root and work its way down to the desired node every time a lookup is required. Wikipedia has some more in-depth analysis. To summarize, the efficiency of a tree map is dependent upon a consistent ordering, thus the order of the elements is predictable and sane. However, because of the performance hit imposed by the "traverse to your destination" approach, BSTs are only able to provide O(log(n)) performance. For large maps, this can be a significant performance hit.

    It is possible to impose a consistent ordering on a hashtable, but to do so involves using techniques similar to LinkedHashMap and manually maintaining the ordering. Alternatively, two separate data structures can be maintained internally: a hashtable and a tree. The table can be used for lookups, while the tree can be used for iteration. The problem of course is this uses more than double the required memory. Also, insertions are only as fast as the tree: O(log(n)). Concurrent tricks can bring this down a bit, but that isn't a reliable performance optimization.

    In short, your idea sounds really good, but if you actually tried to implement it, you would see that to do so would impose massive performance limitations. The final verdict is (and has been for decades): if you need performance, use a hashtable; if you need ordering and can live with degraded performance, use a balanced binary search tree. I'm afraid there's really no efficiently combining the two structures without losing some of the guarantees of one or the other.

    0 讨论(0)
  • 2020-12-17 15:52

    Note: As noted in all other answers, HashMaps don't have an explicit ordering. They only recognize "equality". Getting an order out of a hash-based data structure is meaningless, as each object is turned into a hash - essentially a random number.

    You can always write a hash function for a class (and often times must), as long as you do it carefully. This is a hard thing to do properly because hash-based data structures rely on a random, uniform distribution of hash values. In Effective Java, there is a large amount of text devoted to properly implementing a hash method with good behaviour.

    With all that being said, if you just want your hashing to ignore the case of a String, you can write a wrapper class around String for this purpose and insert those in your data structure instead.

    A simple implementation:

    public class LowerStringWrapper {
        public LowerStringWrapper(String s) {
            this.s = s;
            this.lowerString = s.toLowerString();
        }
    
        // getter methods omitted
    
        // Rely on the hashing of String, as we know it to be good.
        public int hashCode() { return lowerString.hashCode(); }
    
        // We overrode hashCode, so we MUST also override equals. It is required
        // that if a.equals(b), then a.hashCode() == b.hashCode(), so we must
        // restore that invariant.
        public boolean equals(Object obj) {
            if (obj instanceof LowerStringWrapper) {
                return lowerString.equals(((LowerStringWrapper)obj).lowerString;
            } else {
                return lowerString.equals(obj);
            }
        }
    
        private String s;
        private String lowerString;
    }
    
    0 讨论(0)
  • 2020-12-17 15:54

    A bit late for you, but for future visitors, it might be worth knowing that commons-collections has an AbstractHashedMap (in 3.2.2 and with generics in 4.0). You can override these protected methods to achieve your desired behaviour:

    protected int hash(Object key) { ... }
    protected boolean isEqualKey(Object key1, Object key2) { ... }
    protected boolean isEqualValue(Object value1, Object value2) { ... }
    protected HashEntry createEntry(
        HashEntry next, int hashCode, Object key, Object value) { ... }
    

    An example implementation of such an alternative HashedMap is commons-collections' own IdentityMap (only up to 3.2.2 as Java has its own since 1.4).

    This is not as powerful as providing an external "Hasharator" to a Map instance. You have to implement a new map class for every hashing strategy (composition vs. inheritance striking back...). But it's still good to know.

    0 讨论(0)
  • 2020-12-17 15:58

    Trove4j has the feature I'm after and they call it hashing strategies.

    Their map has an implementation with different limitations and thus different prerequisites, so this does not implicitly mean that an implementation for Java's "native" HashMap would be feasible.

    0 讨论(0)
  • 2020-12-17 16:00

    good question, ask josh bloch. i submitted that concept as an RFE in java 7, but it was dropped, i believe the reason was something performance related. i agree, though, should have been done.

    0 讨论(0)
  • 2020-12-17 16:00

    There's such a feature in com.google.common.collect.CustomConcurrentHashMap, unfortunately, there's currently no public way how to set the Equivalence (their Hasharator). Maybe they're not yet done with it, maybe they don't consider the feature to be useful enough. Ask at the guava mailing list.

    I wonder why it haven't happened yet, as it was mentioned in this talk over two years ago.

    0 讨论(0)
提交回复
热议问题