Dictionary with tuple key slower than nested dictionary. Why?

我只是一个虾纸丫 提交于 2021-02-08 13:45:25

问题


I've tested the speed of retrieving, updating and removing values in a dictionary using a (int, int, string) tuple as key versus the same thing with a nested Dictionary: Dictionary>>.

My tests show the tuple dictionary to be a lot slower (58% for retrieving, 69% for updating and 200% for removing). I did not expect that. The nested dictionary needs to do more lookups, so why is the tuple dictionary that much slower?

My test code:

    public static object TupleDic_RemoveValue(object[] param)
    {
        var dic = param[0] as Dictionary<(int did, int eid, string name), string>;
        var keysToRetrieve = param[2] as List<(int did, int eid, string name)>;

        foreach (var key in keysToRetrieve)
        {
            dic.Remove(key);
        }

        return dic;

    }


    public static object NestedDic_RemoveValue(object[] param)
    {
        var dic = param[1] as Dictionary<int, Dictionary<int, Dictionary<string, string>>>;
        var keysToRetrieve = param[2] as List<(int did, int eid, string name)>;


        foreach (var key in keysToRetrieve)
        {
            if (dic.TryGetValue(key.did, out var elementMap) && elementMap.TryGetValue(key.eid, out var propertyMap))
                propertyMap.Remove(key.name);
        }

        return dic;

    }

Extra info on the test: The dictionary contains a total of 10 000 entries. The keys are incrementing: ([0-100],[0-100],"Property[0-100]"). In a single test 100 keys are retrieved (for which 10% was not present in the dictionary), 100 values are updated (for which 10% are new) or 100 keys are removed (for which 10% were not in the dictionary to begin with). Retrieval, updating and removing were 3 separate tests. Each test was executed 1000 times. I compared both the mean and median execution time.


回答1:


Lookups in a Dictionary rely on two things. The first is an item's hash code which is used to separate the items into buckets. Two different keys can have the same hash code, so once a potential match is found, Equals is called against each item (with that hash code) until an exact match is found.

ValueTuple's hash code implementation (for arity-2+ *) passes the result of Equality Comparer.Default<T>.GetHashCode for each item In the tuple to an internal method ValueTuple.CombineHashCodes, which in turn calls System.Numerics.Hashing.HashHelpers.Combine. The more items in the tuple, the more nested calls to both of the Combine methods. Compare this to a normal int's GetHashCode which just returns the value directly.

It makes sense to me that your latter example would be faster. As pointed out in the comments, you are also cutting the necessary data to search into smaller partitions. Each lookup has to call GetHashCode and upon finding a potential match, Equals. It seems to me that there's a higher chance for hash collision in the first scenario, which would mean more calls to Equals (which in this case is just a call to EqualityComparer<T>.Default.Equals for each item in the tuple).

In the end it comes down to profiling (and rather, profiling properly--Release Mode, jitting the calls, enough iterations, etc.) as well as your particular use case.

If performance really matters in your use case (lookups in a tight loop, for example), perhaps it would be better to use your own type and hash code/equals implementations rather than ValueTuples. But again, it comes down to profiling.

* Note that there is a special case for a 1-arity tuple.

HashHelpers.Combine

ValueTuple

Int32.GetHashCode



来源:https://stackoverflow.com/questions/48986130/dictionary-with-tuple-key-slower-than-nested-dictionary-why

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!