问题
I'm looking for built-in alternatives of HashSet
and Dictionary
objects that have better performance than lists but do not use the internal GetHashCode
method. I need this because for the class I have written, there is no way of writing a GetHashCode
method that fulfills the usual contract with Equals
other than
public override int GetHashCode() { return 0; } // or return any other constant value
which would turn HashSet
and Dictionary
into ordinary lists (performance-wise).
So what I need is a set implementation and a mapping implementation. Any suggestions?
EDIT:
My class is a tolerance-based 3-dimensional vector class:
public class Vector
{
private static const double TOL = 1E-10;
private double x, y, z;
public Vector(double x, double y, double z)
{
this.x = x; this.y = y; this.z = z;
}
public override bool Equals(object o)
{
Vector other = o as Vector;
if (other == null)
return false;
return ((Math.Abs(x - other.x) <= TOL) &&
(Math.Abs(y - other.y) <= TOL) &&
(Math.Abs(z - other.z) <= TOL));
}
}
Note that my Equals
method is not transitive. However, in my use case I can make it "locally" transitive because at some point, I will know all vectors that I need to put into my set / mapping key set, and I also know that they will come in clusters. So when I have collected all vectors, I will choose one representative per cluster and replace all original vectors by the representative. Then Equals
will be transitive among the elements of my set / mapping key set.
When I have my set or mapping, I will collect vectors from another source (for the sake of this question let's assume I'll ask a user to type in a vector). These can be any possible vector. Those will never be added to the set/mapping, but I will need to know if they are contained in the set / key set of the mapping (regarding tolerance), and I will need to know their value from the mapping.
回答1:
You need a data structure that supports sorting, binary search and fast insertion. Unfortunately there is no such collection in the .NET Framework. The SortedDictionary
doesn't supports binary search, while the SortedList
suffers from O(n) insertion for unsorted data. So you must search for a third party tool. A good candidate seems to be the TreeDictionary
of C5 library. It is a red-black tree implementation that offers the important method RangeFromTo
. Here is an incomplete implementation of a Dictionary that has Vectors as keys, backed internally by a C5.TreeDictionary:
public class VectorDictionary<TValue>
{
private C5.TreeDictionary<double, (Vector, TValue)> _tree =
new C5.TreeDictionary<double, (Vector, TValue)>();
public bool TryGetKeyValue(Vector key, out (Vector, TValue) pair)
{
double xyz = key.X + key.Y + key.Z;
// Hoping that not all vectors are crowded in the same diagonal line
var range = _tree.RangeFromTo(xyz - Vector.TOL * 3, xyz + Vector.TOL * 3);
var equalPairs = range.Where(e => e.Value.Item1.Equals(key));
// Selecting a vector from many "equal" vectors is tricky.
// Some may be more equal than others. :-) Lets return the first for now.
var selectedPair = equalPairs.FirstOrDefault().Value;
pair = selectedPair;
return selectedPair.Item1 != null;
}
public Vector GetExisting(Vector key)
{
return TryGetKeyValue(key, out var pair) ? pair.Item1 : default;
}
public bool Contains(Vector key) => TryGetKeyValue(key, out var _);
public bool Add(Vector key, TValue value)
{
if (Contains(key)) return false;
_tree.Add(key.X + key.Y + key.Z, (key, value));
return true;
}
public TValue this[Vector key]
{
get => TryGetKeyValue(key, out var pair) ? pair.Item2 : default;
set => _tree.Add(key.X + key.Y + key.Z, (key, value));
}
public int Count => _tree.Count;
}
Usage example:
var dictionary = new VectorDictionary<int>();
Console.WriteLine($"Added: {dictionary.Add(new Vector(0.5 * 1E-10, 0, 0), 1)}");
Console.WriteLine($"Added: {dictionary.Add(new Vector(0.6 * 1E-10, 0, 0), 2)}");
Console.WriteLine($"Added: {dictionary.Add(new Vector(1.6 * 1E-10, 0, 0), 3)}");
Console.WriteLine($"dictionary.Count: {dictionary.Count}");
Console.WriteLine($"dictionary.Contains: {dictionary.Contains(new Vector(2.5 * 1E-10, 0, 0))}");
Console.WriteLine($"dictionary.GetValue: {dictionary[new Vector(2.5 * 1E-10, 0, 0)]}");
Output:
Added: True
Added: False
Added: True
dictionary.Count: 2
dictionary.Contains: True
dictionary.GetValue: 3
回答2:
You can get a reasonably good hashcode implementation in your case. Remember that the most important rule for a hash code is the following:
- Two equal vectors must return the same value
This does not mean that two different vectors cannot return the same value; they obviously have to in some cases, the number of hashes is limited, the number of distinct vectors for all purposes isn't.
Well, with that in mind, simply evaluate your hashcode based upon the vectors coordinates truncated to the tolerance's significant digits minus one. All equal vectors will give you the same hash and a small minority of non equal vectors that differ in the last decimal wont...you can live with that.
UPDATE: Changed rounded to truncated. Rounding is not the right choice.
来源:https://stackoverflow.com/questions/38590854/c-sharp-performant-alternatives-to-hashset-and-dictionary-that-do-not-use-gethas