Find sequence in IEnumerable using Linq

后端 未结 5 1371
醉酒成梦
醉酒成梦 2020-12-03 23:01

What is the most efficient way to find a sequence within a IEnumerable using LINQ

I want to be able to create an extension method which allows

相关标签:
5条回答
  • 2020-12-03 23:15

    UPDATE: Given the clarification of the question my response below isn't as applicable. Leaving it for historical purposes.

    You probably want to use mySequence.Where(). Then the key is to optimize the predicate to work well in your environment. This can vary quite a bit depending on your requirements and typical usage patterns.

    It is quite possible that what works well for small collections doesn't scale well for much larger collections depending on what type T is.

    Of course, if the 90% use is for small collections then optimizing for the outlier large collection seems a bit YAGNI.

    0 讨论(0)
  • 2020-12-03 23:18

    Here's an implementation of an algorithm that finds a subsequence in a sequence. I called the method IndexOfSequence, because it makes the intent more explicit and is similar to the existing IndexOf method:

    public static class ExtensionMethods
    {
        public static int IndexOfSequence<T>(this IEnumerable<T> source, IEnumerable<T> sequence)
        {
            return source.IndexOfSequence(sequence, EqualityComparer<T>.Default);
        }
    
        public static int IndexOfSequence<T>(this IEnumerable<T> source, IEnumerable<T> sequence, IEqualityComparer<T> comparer)
        {
            var seq = sequence.ToArray();
    
            int p = 0; // current position in source sequence
            int i = 0; // current position in searched sequence
            var prospects = new List<int>(); // list of prospective matches
            foreach (var item in source)
            {
                // Remove bad prospective matches
                prospects.RemoveAll(k => !comparer.Equals(item, seq[p - k]));
    
                // Is it the start of a prospective match ?
                if (comparer.Equals(item, seq[0]))
                {
                    prospects.Add(p);
                }
    
                // Does current character continues partial match ?
                if (comparer.Equals(item, seq[i]))
                {
                    i++;
                    // Do we have a complete match ?
                    if (i == seq.Length)
                    {
                        // Bingo !
                        return p - seq.Length + 1;
                    }
                }
                else // Mismatch
                {
                    // Do we have prospective matches to fall back to ?
                    if (prospects.Count > 0)
                    {
                        // Yes, use the first one
                        int k = prospects[0];
                        i = p - k + 1;
                    }
                    else
                    {
                        // No, start from beginning of searched sequence
                        i = 0;
                    }
                }
                p++;
            }
            // No match
            return -1;
        }
    }
    

    I didn't fully test it, so it might still contain bugs. I just did a few tests on well-known corner cases to make sure I wasn't falling into obvious traps. Seems to work fine so far...

    I think the complexity is close to O(n), but I'm not an expert of Big O notation so I could be wrong... at least it only enumerates the source sequence once, whithout ever going back, so it should be reasonably efficient.

    0 讨论(0)
  • 2020-12-03 23:20

    You can use this library called Sequences to do that (disclaimer: I'm the author).

    It has a IndexOfSlice method that does exactly what you need - it's an implementation of the Knuth-Morris-Pratt algorithm.

    int startIndex = largeSequence.AsSequence().IndexOfSlice(subSequence);
    
    0 讨论(0)
  • 2020-12-03 23:34

    The code you say you want to be able to use isn't LINQ, so I don't see why it need be implemented with LINQ.

    This is essentially the same problem as substring searching (indeed, an enumeration where order is significant is a generalisation of "string").

    Since computer science has considered this problem frequently for a long time, so you get to stand on the shoulders of giants.

    Some reasonable starting points are:

    http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm

    http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm

    http://en.wikipedia.org/wiki/Rabin-karp

    Even just the pseudocode in the wikipedia articles is enough to port to C# quite easily. Look at the descriptions of performance in different cases and decide which cases are most likely to be encountered by your code.

    0 讨论(0)
  • 2020-12-03 23:38

    I understand this is an old question, but I needed this exact method and I wrote it up like so:

    public static int ContainsSubsequence<T>(this IEnumerable<T> elements, IEnumerable<T> subSequence) where T: IEquatable<T>
    {
        return ContainsSubsequence(elements, 0, subSequence);
    }
    
    private static int ContainsSubsequence<T>(IEnumerable<T> elements, int index, IEnumerable<T> subSequence) where T: IEquatable<T>
    {
        // Do we have any elements left?
        bool elementsLeft = elements.Any();
    
        // Do we have any of the sub-sequence left?
        bool sequenceLeft = subSequence.Any();
    
        // No elements but sub-sequence not fully matched
        if (!elementsLeft && sequenceLeft)
            return -1; // Nope, didn't match
    
        // No elements of sub-sequence, which means even if there are
        // more elements, we matched the sub-sequence fully
        if (!sequenceLeft)
            return index - subSequence.Count(); // Matched!
    
        // If we didn't reach a terminal condition,
        // check the first element of the sub-sequence against the first element
        if (subSequence.First().Equals(e.First()))
            // Yes, it matched - move onto the next. Consume (skip) one element in each
            return ContainsSubsequence(elements.Skip(1), index + 1 subSequence.Skip(1));
        else
            // No, it didn't match. Try the next element, without consuming an element
            // from the sub-sequence
            return ContainsSubsequence(elements.Skip(1), index + 1, subSequence);
    }
    

    Updated to not just return if the sub-sequence matched, but where it started in the original sequence.

    This is an extension method on IEnumerable, fully lazy, terminates early and is far more linq-ified than the currently up-voted answer. Bewarned, however (as @wai-ha-lee points out) it is recursive and creates a lot of enumerators. Use it where applicable (performance/memory). This was fine for my needs, but YMMV.

    0 讨论(0)
提交回复
热议问题