How to subtract one huge list from another efficiently in C#

后端 未结 4 884
天命终不由人
天命终不由人 2021-02-07 00:09

I have a very long list of Ids (integers) that represents all the items that are currently in my database:

var idList = GetAllIds();

I also hav

相关标签:
4条回答
  • 2021-02-07 00:41

    LINQ could help:

    itemsToAdd.Except(idList)
    

    Your code is slow because List<T>.Contains is O(n). So your total cost is O(itemsToAdd.Count*idList.Count).

    You can make idList into a HashSet<T> which has O(1) .Contains. Or just use the Linq .Except extension method which does it for you.

    Note that .Except will also remove all duplicates from the left side. i.e. new int[]{1,1,2}.Except(new int[]{2}) will result in just {1} and the second 1 was removed. But I assume it's no problem in your case because IDs are typically unique.

    0 讨论(0)
  • 2021-02-07 00:44

    Assuming the following premises are true:

    • idList and itemsToAdd may not contain duplicate values
    • you are using the .NET Framework 4.0

    you could use a HashSet<T> this way:

    var itemsToAddSet = new HashSet(itemsToAdd);
    itemsToAddSet.ExceptWith(idList);
    

    According to the documentation the ISet<T>.ExceptWith method is pretty efficient:

    This method is an O(n) operation, where n is the number of elements in the other parameter.

    In your case n is the number of items in idList.

    0 讨论(0)
  • 2021-02-07 00:49

    Transform temporarily idList to an HashSet<T> and use the same method i.e.:

    items.RemoveAll(e => idListHash.Contains(e.Id));
    

    it should be much faster

    0 讨论(0)
  • 2021-02-07 00:49

    You should use two HashSet<int>s.
    Note that they're unique and unordered.

    0 讨论(0)
提交回复
热议问题