What is the best (and fastest) way to retrieve a random row using Linq to SQL when I have a condition, e.g. some field must be true?
I have random function query against DataTable
s:
var result = (from result in dt.AsEnumerable()
order by Guid.NewGuid()
select result).Take(3);
If the purpose of getting random rows is sampling, I have talked very briefly here about a nice approach from Larson et al., Microsoft Research team where they have developed a sampling framework for Sql Server using materialized views. There is a link to the actual paper also.
If you use LINQPad, switch to C# program mode and do this way:
void Main()
{
YourTable.OrderBy(v => Random()).FirstOrDefault.Dump();
}
[Function(Name = "NEWID", IsComposable = true)]
public Guid Random()
{
throw new NotImplementedException();
}
Another sample for Entity Framework:
var customers = db.Customers
.Where(c => c.IsActive)
.OrderBy(c => Guid.NewGuid())
.FirstOrDefault();
This does not work with LINQ to SQL. The OrderBy
is simply being dropped.
Using LINQ to SQL in LINQPad as C# statements look like
IEnumerable<Customer> customers = this.ExecuteQuery<Customer>(@"SELECT top 10 * from [Customers] order by newid()");
customers.Dump();
The generated SQL is
SELECT top 10 * from [Customers] order by newid()
EDIT: I've only just noticed this is LINQ to SQL, not LINQ to Objects. Use Marc's code to get the database to do this for you. I've left this answer here as a potential point of interest for LINQ to Objects.
Strangely enough, you don't actually need to get the count. You do, however, need to fetch every element unless you get the count.
What you can do is keep the idea of a "current" value and the current count. When you fetch the next value, take a random number and replace the "current" with "new" with a probability of 1/n where n is the count.
So when you read the first value, you always make that the "current" value. When you read the second value, you might make that the current value (probability 1/2). When you read the third value, you might make that the current value (probability 1/3) etc. When you've run out of data, the current value is a random one out of all the ones you read, with uniform probability.
To apply that with a condition, just ignore anything which doesn't meet the condition. The easiest way to do that is to only consider the "matching" sequence to start with, by applying a Where clause first.
Here's a quick implementation. I think it's okay...
public static T RandomElement<T>(this IEnumerable<T> source,
Random rng)
{
T current = default(T);
int count = 0;
foreach (T element in source)
{
count++;
if (rng.Next(count) == 0)
{
current = element;
}
}
if (count == 0)
{
throw new InvalidOperationException("Sequence was empty");
}
return current;
}