问题

Given two arrays as parameters (x and y) and find the starting index where the first occurrence of y in x. I am wondering what the simplest or the fastest implementation would be.

Example:

when x = {1,2,4,2,3,4,5,6}
     y =       {2,3}
result
     starting index should be 3

Update: Since my code is wrong I removed it from the question.

回答1:

Here is a simple (yet fairly efficient) implementation that finds all occurances of the array, not just the first one:

static class ArrayExtensions {

  public static IEnumerable<int> StartingIndex(this int[] x, int[] y) {
    IEnumerable<int> index = Enumerable.Range(0, x.Length - y.Length + 1);
    for (int i = 0; i < y.Length; i++) {
      index = index.Where(n => x[n + i] == y[i]).ToArray();
    }
    return index;
  }

}

Example:

int[] x = { 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4 };
int[] y = { 2, 3 };
foreach (int i in x.StartingIndex(y)) {
  Console.WriteLine(i);
}

Output:

1
5
9

The method first loops through the x array to find all occurances of the first item in the y array, and place the index of those in the index array. Then it goes on to reduce the matches by checking which of those also match the second item in the y array. When all items in the y array is checked, the index array contains only the full matches.

Edit:
An alternative implementation would be to remove the ToArray call from the statement in the loop, making it just:

index = index.Where(n => x[n + i] == y[i]);

This would totally change how the method works. Instead of looping through the items level by level, it would return an enumerator with nested expressions, deferring the search to the time when the enumerator was iterated. That means that you could get only the first match if you wanted:

int index = x.StartingIndex(y).First();

This would not find all matches and then return the first, it would just search until the first was found and then return it.

回答2:

Simplest to write?

    return (from i in Enumerable.Range(0, 1 + x.Length - y.Length)
            where x.Skip(i).Take(y.Length).SequenceEqual(y)
            select (int?)i).FirstOrDefault().GetValueOrDefault(-1);

Not quite as efficient, of course... a bit more like it:

private static bool IsSubArrayEqual(int[] x, int[] y, int start) {
    for (int i = 0; i < y.Length; i++) {
        if (x[start++] != y[i]) return false;
    }
    return true;
}
public static int StartingIndex(this int[] x, int[] y) {
    int max = 1 + x.Length - y.Length;
    for(int i = 0 ; i < max ; i++) {
        if(IsSubArrayEqual(x,y,i)) return i;
    }
    return -1;
}

回答3:

The simplest way is probably this:

public static class ArrayExtensions
{
    private static bool isMatch(int[] x, int[] y, int index)
    {
        for (int j = 0; j < y.Length; ++j)
            if (x[j + index] != y[j]) return false;
        return true;
    }

    public static int IndexOf(this int[] x, int[] y)
    {
        for (int i = 0; i < x.Length - y.Length + 1; ++i)
            if (isMatch(x, y, i)) return i;
        return -1;
    }
}

But it's definitely not the fastest way.

回答4:

"Simplest" and "fastest" are opposites in this case, and besides, in order to describe fast algorithms we need to know lots of things about how the source array and the search array are related to each other.

This is essentially the same problem as finding a substring inside a string. Suppose you are looking for "fox" in "the quick brown fox jumps over the lazy dog". The naive string matching algorithm is extremely good in this case. If you are searching for "bananananananananananananananana" inside a million-character string that is of the form "banananananabanananabananabananabanananananbananana..." then the naive substring matching algorithm is terrible -- far faster results can be obtained by using more complex and sophisticated string matching algorithms. Basically, the naive algorithm is O(nm) where n and m are the lengths of the source and search strings. There are O(n+m) algorithms but they are far more complex.

Can you tell us more about the data you're searching? How big is it, how redundant is it, how long are the search arrays, and what is the likelihood of a bad match?

回答5:

This is based off of Mark Gravell's answer but I made it generic and added some simple bounds checking to keep exceptions from being thrown

private static bool IsSubArrayEqual<T>(T[] source, T[] compare, int start) where T:IEquatable<T>
{
    if (compare.Length > source.Length - start)
    {
        //If the compare string is shorter than the test area it is not a match.
        return false;
    }

    for (int i = 0; i < compare.Length; i++)
    {
        if (source[start++].Equals(compare[i]) == false) return false;
    }
    return true;
}

Could be improved further by implementing Boyer-Moore but for short patterns it works fine.

回答6:

I find something along the following lines more intuitive, but that may be a matter of taste.

public static class ArrayExtensions
{
    public static int StartingIndex(this int[] x, int[] y)
    {
        var xIndex = 0;
        while(xIndex < x.length)
        {
            var found = xIndex;
            var yIndex = 0;
            while(yIndex < y.length && xIndex < x.length && x[xIndex] == y[yIndex])
            {
                xIndex++;
                yIndex++;
            }

            if(yIndex == y.length-1)
            {
                return found;
            }

            xIndex = found + 1;
        }

        return -1;
    }
}

This code also addresses an issue I believe your implementation may have in cases like x = {3, 3, 7}, y = {3, 7}. I think what would happen with your code is that it matches the first number, then resets itself on the second, but starts matching again on the third, rather than stepping back to the index just after where it started matching. May be missing something, but it's definitely something to consider and should be easily fixable in your code.

回答7:

    //this is the best in C#

    //bool contains(array,subarray)
    //  when find (subarray[0])
    //      while subarray[next] IS OK
    //          subarray.end then Return True
    public static bool ContainSubArray<T>(T[] findIn, out int found_index,
 params T[]toFind)
    {
        found_index = -1;
        if (toFind.Length < findIn.Length)
        {

            int index = 0;
            Func<int, bool> NextOk = (i) =>
                {
                    if(index < findIn.Length-1)
                        return findIn[++index].Equals(toFind[i]);
                    return false;
                };
            //----------
            int n=0;
            for (; index < findIn.Length; index++)
            {
                if (findIn[index].Equals(toFind[0]))
                {
                    found_index=index;n=1;
                    while (n < toFind.Length && NextOk(n))
                        n++;
                }
                if (n == toFind.Length)
                {
                    return true;
                }
            }

        }
        return false;
    }

回答8:

using System;
using System.Linq;

public class Test
{
    public static void Main()
    {
        int[] x = {1,2,4,2,3,4,5,6};
        int[] y =       {2,3};
        int? index = null;

        for(int i=0; i<x.Length; ++i)
        {
            if (y.SequenceEqual(x.Skip(i).Take(y.Length)))
            {
                index = i;
                break;
            }
        }
        Console.WriteLine($"{index}");
    }
}

Output

来源：https://stackoverflow.com/questions/1780423/find-the-first-occurrence-starting-index-of-the-sub-array-in-c-sharp

标签

arrays

sub-array

Find the first occurrence/starting index of the sub-array in C#

问题