Say that I have LINQ query such as:
var authors = from x in authorsList
where x.firstname == \"Bob\"
select x;
I was wondering, if there is any difference between RemoveAll
and Except
and the pros of using HashSet
, so I have done quick performance check :)
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
namespace ListRemoveTest
{
class Program
{
private static Random random = new Random( (int)DateTime.Now.Ticks );
static void Main( string[] args )
{
Console.WriteLine( "Be patient, generating data..." );
List<string> list = new List<string>();
List<string> toRemove = new List<string>();
for( int x=0; x < 1000000; x++ )
{
string randString = RandomString( random.Next( 100 ) );
list.Add( randString );
if( random.Next( 1000 ) == 0 )
toRemove.Insert( 0, randString );
}
List<string> l1 = new List<string>( list );
List<string> l2 = new List<string>( list );
List<string> l3 = new List<string>( list );
List<string> l4 = new List<string>( list );
Console.WriteLine( "Be patient, testing..." );
Stopwatch sw1 = Stopwatch.StartNew();
l1.RemoveAll( toRemove.Contains );
sw1.Stop();
Stopwatch sw2 = Stopwatch.StartNew();
l2.RemoveAll( new HashSet<string>( toRemove ).Contains );
sw2.Stop();
Stopwatch sw3 = Stopwatch.StartNew();
l3 = l3.Except( toRemove ).ToList();
sw3.Stop();
Stopwatch sw4 = Stopwatch.StartNew();
l4 = l4.Except( new HashSet<string>( toRemove ) ).ToList();
sw3.Stop();
Console.WriteLine( "L1.Len = {0}, Time taken: {1}ms", l1.Count, sw1.Elapsed.TotalMilliseconds );
Console.WriteLine( "L2.Len = {0}, Time taken: {1}ms", l1.Count, sw2.Elapsed.TotalMilliseconds );
Console.WriteLine( "L3.Len = {0}, Time taken: {1}ms", l1.Count, sw3.Elapsed.TotalMilliseconds );
Console.WriteLine( "L4.Len = {0}, Time taken: {1}ms", l1.Count, sw3.Elapsed.TotalMilliseconds );
Console.ReadKey();
}
private static string RandomString( int size )
{
StringBuilder builder = new StringBuilder();
char ch;
for( int i = 0; i < size; i++ )
{
ch = Convert.ToChar( Convert.ToInt32( Math.Floor( 26 * random.NextDouble() + 65 ) ) );
builder.Append( ch );
}
return builder.ToString();
}
}
}
Results below:
Be patient, generating data...
Be patient, testing...
L1.Len = 985263, Time taken: 13411.8648ms
L2.Len = 985263, Time taken: 76.4042ms
L3.Len = 985263, Time taken: 340.6933ms
L4.Len = 985263, Time taken: 340.6933ms
As we can see, best option in that case is to use RemoveAll(HashSet)
I think you could do something like this
authorsList = (from a in authorsList
where !authors.Contains(a)
select a).ToList();
Although I think the solutions already given solve the problem in a more readable way.
It'd be better to use List<T>.RemoveAll to accomplish this.
authorsList.RemoveAll((x) => x.firstname == "Bob");
Say that authorsToRemove
is an IEnumerable<T>
that contains the elements you want to remove from authorsList
.
Then here is another very simple way to accomplish the removal task asked by the OP:
authorsList.RemoveAll(authorsToRemove.Contains);
Simple solution:
static void Main()
{
List<string> myList = new List<string> { "Jason", "Bob", "Frank", "Bob" };
myList.RemoveAll(x => x == "Bob");
foreach (string s in myList)
{
//
}
}
LINQ has its origins in functional programming, which emphasises immutability of objects, so it doesn't provide a built-in way to update the original list in-place.
Note on immutability (taken from another SO answer):
Here is the definition of immutability from Wikipedia.
In object-oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created.