问题
Given this collection:
var list = new [] {
"1.one",
"2. two",
"no number",
"2.duplicate",
"300. three hundred",
"4-ignore this"};
How can I get subset of items that start with a number followed by a dot (regex @"^\d+(?=\.)"
) with distinct numbers? That is:
{"1.one", "2. two", "300. three hundred"}
UPDATE:
My attempt on this was to use an IEqualityComparer
to pass to the Distinct
method. I borrowed this GenericCompare class and tried the following code to no avail:
var pattern = @"^\d+(?=\.)";
var comparer = new GenericCompare<string>(s => Regex.Match(s, pattern).Value);
list.Where(f => Regex.IsMatch(f, pattern)).Distinct(comparer);
回答1:
If you fancy an approach with Linq, you can try adding a named capture group to the regex, then filter the items that match the regex, group by the captured number and finally get only the first string for each number. I like the readability of the solution but I wouldn´t be surprised if there is a more efficient way of eliminating the duplicates, let´s see if somebody else comes with a different approach.
Something like this:
list.Where(s => regex.IsMatch(s))
.GroupBy(s => regex.Match(s).Groups["num"].Value)
.Select(g => g.First())
You can give it a try with this sample:
public class Program
{
private static readonly Regex regex = new Regex(@"^(?<num>\d+)\.", RegexOptions.Compiled);
public static void Main()
{
var list = new [] {
"1.one",
"2. two",
"no number",
"2.duplicate",
"300. three hundred",
"4-ignore this"
};
var distinctWithNumbers = list.Where(s => regex.IsMatch(s))
.GroupBy(s => regex.Match(s).Groups["num"].Value)
.Select(g => g.First());
distinctWithNumbers.ToList().ForEach(Console.WriteLine);
Console.ReadKey();
}
}
You can try the approach it in this fiddle
As pointed by @orad in the comments, there is a Linq extension DistinctBy() in MoreLinq that could be used instead of grouping and then getting the first item in the group to eliminate the duplicates:
var distinctWithNumbers = list.Where(s => regex.IsMatch(s))
.DistinctBy(s => regex.Match(s).Groups["num"].Value);
Try it in this fiddle
EDIT
If you want to use your comparer, you need to implement the GetHashCode
so it uses the expression as well:
public int GetHashCode(T obj)
{
return _expr.Invoke(obj).GetHashCode();
}
Then you can use the comparer with a lambda function that takes a string and gets the number using the regex:
var comparer = new GenericCompare<string>(s => regex.Match(s).Groups["num"].Value);
var distinctWithNumbers = list.Where(s => regex.IsMatch(s)).Distinct(comparer);
I have created another fiddle with this approach.
Using lookahead regex
You can use any of these 2 approaches with the regex @"^\d+(?=\.)"
.
Just change the lambda expressions getting the "num" group s => regex.Match(s).Groups["num"].Value
with a expression that gets the regex match s => regex.Match(s).Value
Updated fiddle here.
回答2:
(I could mark this as answer too)
This solution works without duplicate regex runs:
var regex = new Regex(@"^\d+(?=\.)", RegexOptions.Compiled);
list.Select(i => {
var m = regex.Match(i);
return new KeyValuePair<int, string>( m.Success ? Int32.Parse(m.Value) : -1, i );
})
.Where(i => i.Key > -1)
.GroupBy(i => i.Key)
.Select(g => g.First().Value);
Run it in this fiddle.
回答3:
Your solution is good enough.
You can also use LINQ query syntax to avoid regex re-runs with the help of let
keyword as follows:
var result =
from kvp in
(
from s in source
let m = regex.Match(s)
where m.Success
select new KeyValuePair<int, string>(int.Parse(m.Value), s)
)
group kvp by kvp.Key into gr
select new string(gr.First().Value);
回答4:
Something like this should work:
List<string> c = new List<string>()
{
"1.one",
"2. two",
"no number",
"2.duplicate",
"300. three hundred",
"4-ignore this"
};
c.Where(i =>
{
var match = Regex.Match(i, @"^\d+(?=\.)");
return match.Success;
});
来源:https://stackoverflow.com/questions/25513372/distinct-by-part-of-the-string-in-linq