I am looking for a way to quickly remove items from a C# List
. The documentation states that the List.Remove()
and List.RemoveAt()<
I've found when dealing with large lists, this is often faster. The speed of the Remove and finding the right item in the dictionary to remove, more than makes up for creating the dictionary. A couple things though, the original list has to have unique values, and I don't think the order is guaranteed once you are done.
List<long> hundredThousandItemsInOrignalList;
List<long> fiftyThousandItemsToRemove;
// populate lists...
Dictionary<long, long> originalItems = hundredThousandItemsInOrignalList.ToDictionary(i => i);
foreach (long i in fiftyThousandItemsToRemove)
{
originalItems.Remove(i);
}
List<long> newList = originalItems.Select(i => i.Key).ToList();
I feel a HashSet
, LinkedList
or Dictionary
will do you much better.
Ok try RemoveAll used like this
static void Main(string[] args)
{
Stopwatch watch = new Stopwatch();
watch.Start();
List<Int32> test = GetList(500000);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());
watch.Reset(); watch.Start();
test.RemoveAll( t=> t % 5 == 0);
List<String> test2 = test.ConvertAll(delegate(int i) { return i.ToString(); });
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());
Console.WriteLine((500000 - test.Count).ToString());
Console.ReadLine();
}
static private List<Int32> GetList(int size)
{
List<Int32> test = new List<Int32>();
for (int i = 0; i < 500000; i++)
test.Add(i);
return test;
}
this only loops twice and removes eactly 100,000 items
My output for this code:
00:00:00.0099495
00:00:00.1945987
1000000
Updated to try a HashSet
static void Main(string[] args)
{
Stopwatch watch = new Stopwatch();
do
{
// Test with list
watch.Reset(); watch.Start();
List<Int32> test = GetList(500000);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());
watch.Reset(); watch.Start();
List<String> myList = RemoveTest(test);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());
Console.WriteLine((500000 - test.Count).ToString());
Console.WriteLine();
// Test with HashSet
watch.Reset(); watch.Start();
HashSet<String> test2 = GetStringList(500000);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());
watch.Reset(); watch.Start();
HashSet<String> myList2 = RemoveTest(test2);
watch.Stop(); Console.WriteLine(watch.Elapsed.ToString());
Console.WriteLine((500000 - test.Count).ToString());
Console.WriteLine();
} while (Console.ReadKey().Key != ConsoleKey.Escape);
}
static private List<Int32> GetList(int size)
{
List<Int32> test = new List<Int32>();
for (int i = 0; i < 500000; i++)
test.Add(i);
return test;
}
static private HashSet<String> GetStringList(int size)
{
HashSet<String> test = new HashSet<String>();
for (int i = 0; i < 500000; i++)
test.Add(i.ToString());
return test;
}
static private List<String> RemoveTest(List<Int32> list)
{
list.RemoveAll(t => t % 5 == 0);
return list.ConvertAll(delegate(int i) { return i.ToString(); });
}
static private HashSet<String> RemoveTest(HashSet<String> list)
{
list.RemoveWhere(t => Convert.ToInt32(t) % 5 == 0);
return list;
}
This gives me:
00:00:00.0131586
00:00:00.1454723
100000
00:00:00.3459420
00:00:00.2122574
100000
You could always remove the items from the end of the list. List removal is O(1) when performed on the last element since all it does is decrement count. There is no shifting of next elements involved. (which is the reason why list removal is O(n) generally)
for (int i = list.Count - 1; i >= 0; --i)
list.RemoveAt(i);
Or you could do this:
List<int> listA;
List<int> listB;
...
List<int> resultingList = listA.Except(listB);