LINQ Performance for Large Collections

前端 未结 6 1713
梦毁少年i
梦毁少年i 2021-02-02 11:48

I have a large collection of strings (up to 1M) alphabetically sorted. I have experimented with LINQ queries against this collection using HashSet, SortedDictionary, and Dictio

6条回答
  •  不知归路
    2021-02-02 12:09

    In your current code you don't make use of any of the special features of the Dictionary / SortedDictionary / HashSet collections, you are using them the same way that you would use a List. That is why you don't see any difference in performance.

    If you use a dictionary as index where the first few characters of the string is the key and a list of strings is the value, you can from the search string pick out a small part of the entire collection of strings that has possible matches.

    I wrote the class below to test this. If I populate it with a million strings and search with an eight character string it rips through all possible matches in about 3 ms. Searching with a one character string is the worst case, but it finds the first 1000 matches in about 4 ms. Finding all matches for a one character strings takes about 25 ms.

    The class creates indexes for 1, 2, 4 and 8 character keys. If you look at your specific data and what you search for, you should be able to select what indexes to create to optimise it for your conditions.

    public class IndexedList {
    
        private class Index : Dictionary> {
    
            private int _indexLength;
    
            public Index(int indexLength) {
                _indexLength = indexLength;
            }
    
            public void Add(string value) {
                if (value.Length >= _indexLength) {
                    string key = value.Substring(0, _indexLength);
                    List list;
                    if (!this.TryGetValue(key, out list)) {
                        Add(key, list = new List());
                    }
                    list.Add(value);
                }
            }
    
            public IEnumerable Find(string query, int limit) {
                return
                    this[query.Substring(0, _indexLength)]
                    .Where(s => s.Length > query.Length && s.StartsWith(query))
                    .Take(limit);
            }
    
        }
    
        private Index _index1;
        private Index _index2;
        private Index _index4;
        private Index _index8;
    
        public IndexedList(IEnumerable values) {
            _index1 = new Index(1);
            _index2 = new Index(2);
            _index4 = new Index(4);
            _index8 = new Index(8);
            foreach (string value in values) {
                _index1.Add(value);
                _index2.Add(value);
                _index4.Add(value);
                _index8.Add(value);
            }
        }
    
        public IEnumerable Find(string query, int limit) {
            if (query.Length >= 8) return _index8.Find(query, limit);
            if (query.Length >= 4) return _index4.Find(query,limit);
            if (query.Length >= 2) return _index2.Find(query,limit);
            return _index1.Find(query, limit);
        }
    
    }
    

提交回复
热议问题