With a TreeMap
it\'s trivial to provide a custom Comparator
, thus overriding the semantics provided by Comparable
objects added to the
I suspect this has not been done because it would prevent hashCode caching?
I attempted creating a generic Map solution where all keys are silently wrapped. It turned out that the wrapper would have to hold the wrapped object, the cached hashCode and a reference to the callback interface responsible for equality-checks. This is obviously not as efficient as using a wrapper class, where you'd only have to cache the original key plus one more object (see hazzens answer).
(I also bumped into a problem related to generics; the get-method accepts Object as input, so the callback interface responsible for hashing would have to perform an additional instanceof-check. Either that, or the map class would have to know the Class of its keys.)
HashingStrategy is the concept you're looking for. It's a strategy interface that allows you to define custom implementations of equals and hashcode.
public interface HashingStrategy<E>
{
int computeHashCode(E object);
boolean equals(E object1, E object2);
}
You can't use a HashingStrategy
with the built in HashSet
or HashMap
. GS Collections includes a java.util.Set called UnifiedSetWithHashingStrategy
and a java.util.Map called UnifiedMapWithHashingStrategy
.
Let's look at an example.
public class Data
{
private final int id;
public Data(int id)
{
this.id = id;
}
public int getId()
{
return id;
}
// No equals or hashcode
}
Here's how you might set up a UnifiedSetWithHashingStrategy
and use it.
java.util.Set<Data> set =
new UnifiedSetWithHashingStrategy<>(HashingStrategies.fromFunction(Data::getId));
Assert.assertTrue(set.add(new Data(1)));
// contains returns true even without hashcode and equals
Assert.assertTrue(set.contains(new Data(1)));
// Second call to add() doesn't do anything and returns false
Assert.assertFalse(set.add(new Data(1)));
Why not just use a Map
? UnifiedSetWithHashingStrategy
uses half the memory of a UnifiedMap
, and one quarter the memory of a HashMap
. And sometimes you don't have a convenient key and have to create a synthetic one, like a tuple. That can waste more memory.
How do we perform lookups? Remember that Sets have contains()
, but not get()
. UnifiedSetWithHashingStrategy
implements Pool
in addition to Set
, so it also implements a form of get()
.
Here's a simple approach to handle case-insensitive Strings.
UnifiedSetWithHashingStrategy<String> set =
new UnifiedSetWithHashingStrategy<>(HashingStrategies.fromFunction(String::toLowerCase));
set.add("ABC");
Assert.assertTrue(set.contains("ABC"));
Assert.assertTrue(set.contains("abc"));
Assert.assertFalse(set.contains("def"));
Assert.assertEquals("ABC", set.get("aBc"));
This shows off the API, but it's not appropriate for production. The problem is that the HashingStrategy constantly delegates to String.toLowerCase()
which creates a bunch of garbage Strings. Here's how you can create an efficient hashing strategy for case-insensitive Strings.
public static final HashingStrategy<String> CASE_INSENSITIVE =
new HashingStrategy<String>()
{
@Override
public int computeHashCode(String string)
{
int hashCode = 0;
for (int i = 0; i < string.length(); i++)
{
hashCode = 31 * hashCode + Character.toLowerCase(string.charAt(i));
}
return hashCode;
}
@Override
public boolean equals(String string1, String string2)
{
return string1.equalsIgnoreCase(string2);
}
};
Note: I am a developer on GS collections.
.NET has this via IEqualityComparer (for a type which can compare two objects) and IEquatable (for a type which can compare itself to another instance).
In fact, I believe it was a mistake to define equality and hashcodes in java.lang.Object or System.Object at all. Equality in particular is hard to define in a way which makes sense with inheritance. I keep meaning to blog about this...
But yes, basically the idea is sound.