I am looking for a way to quickly remove items from a C# List
. The documentation states that the List.Remove()
and List.RemoveAt()<
If the order does not matter then there is a simple O(1) List.Remove method.
public static class ListExt
{
// O(1)
public static void RemoveBySwap<T>(this List<T> list, int index)
{
list[index] = list[list.Count - 1];
list.RemoveAt(list.Count - 1);
}
// O(n)
public static void RemoveBySwap<T>(this List<T> list, T item)
{
int index = list.IndexOf(item);
RemoveBySwap(list, index);
}
// O(n)
public static void RemoveBySwap<T>(this List<T> list, Predicate<T> predicate)
{
int index = list.FindIndex(predicate);
RemoveBySwap(list, index);
}
}
This solution is friendly for memory traversal, so even if you need to find the index first it will be very fast.
Notes:
List isn't an efficient data structure when it comes to removal. You would do better to use a double linked list (LinkedList) as removal simply requires reference updates in the adjacent entries.
If you're happy creating a new list, you don't have to go through setting items to null. For example:
// This overload of Where provides the index as well as the value. Unless
// you need the index, use the simpler overload which just provides the value.
List<string> newList = oldList.Where((value, index) => index % 5 != 0)
.ToList();
However, you might want to look at alternative data structures, such as LinkedList<T>
or HashSet<T>
. It really depends on what features you need from your data structure.
The other answers (and the question itself) offer various ways of dealing with this "slug" (slowness bug) using the built-in .NET Framework classes.
But if you're willing to switch to a third-party library, you can get better performance simply by changing the data structure, and leaving your code unchanged except for the list type.
The Loyc Core libraries include two types that work the same way as List<T>
but can remove items faster:
List<T>
when removing items from random locationsList<T>
when your lists are very long (but may be slower when the list is short).Lists are faster than LinkedLists until n gets realy big. The reason for this is because so called cache misses occur quite more frequently using LinkedLists than Lists. Memory look ups are quite expensive. As a list is implemented as an array the CPU can load a bunch of data at once because it knows the required data is stored next to each other. However a linked list does not give the CPU any hint which data is required next which forces the CPU to do quite more memory look ups. By the way. With term memory I mean RAM.
For further details take a look at: https://jackmott.github.io/programming/2016/08/20/when-bigo-foolsya.html
If you still want to use a List as an underlying structure, you can use the following extension method, which does the heavy lifting for you.
using System.Collections.Generic;
using System.Linq;
namespace Library.Extensions
{
public static class ListExtensions
{
public static IEnumerable<T> RemoveRange<T>(this List<T> list, IEnumerable<T> range)
{
var removed = list.Intersect(range).ToArray();
if (!removed.Any())
{
return Enumerable.Empty<T>();
}
var remaining = list.Except(removed).ToArray();
list.Clear();
list.AddRange(remaining);
return removed;
}
}
}
A simple stopwatch test gives results in about 200ms for removal. Keep in mind this is not a real benchmark usage.
public class Program
{
static void Main(string[] args)
{
var list = Enumerable
.Range(0, 500_000)
.Select(x => x.ToString())
.ToList();
var allFifthItems = list.Where((_, index) => index % 5 == 0).ToArray();
var sw = Stopwatch.StartNew();
list.RemoveRange(allFifthItems);
sw.Stop();
var message = $"{allFifthItems.Length} elements removed in {sw.Elapsed}";
Console.WriteLine(message);
}
}
Output:
100000 elements removed in 00:00:00.2291337