Fast intersection of two sorted integer arrays

后端 未结 5 600
说谎
说谎 2021-02-04 11:02

I need to find the intersection of two sorted integer arrays and do it very fast.

Right now, I am using the following code:

int i = 0, j = 0;

while (i          


        
5条回答
  •  -上瘾入骨i
    2021-02-04 11:36

    Yorye Nathan gave me the fastest intersection of two arrays with the last "unsafe code" method. Unfortunately it was still too slow for me, I needed to make combinations of array intersections, which goes up to 2^32 combinations, pretty much no? I made following modifications and adjustments and time dropped to 2.6X time faster. You need to make some pre optimization before, for sure you can do it some way or another. I am using only indexes instead the actual objects or ids or some other abstract comparison. So, by example if you have to intersect big number like this

    Arr1: 103344, 234566, 789900, 1947890, Arr2: 150034, 234566, 845465, 23849854

    put everything into and array

    Arr1: 103344, 234566, 789900, 1947890, 150034, 845465,23849854

    and use, for intersection, the ordered indexes of the result array

    Arr1Index: 0, 1, 2, 3 Arr2Index: 1, 4, 5, 6

    Now we have smaller numbers with whom we can build some other nice arrays. What I did after taking the method from Yorye, I took Arr2Index and expand it into, theoretically boolean array, practically into byte arrays, because of the memory size implication, to following:

    Arr2IndexCheck: 0, 1, 0, 0, 1, 1 ,1

    that is more or less a dictionary which tells me for any index if second array contains it. The next step I did not use memory allocation which also took time, instead I pre-created the result array before calling the method, so during the process of finding my combinations I never instantiate anything. Of course you have to deal with the length of this array separately, so maybe you need to store it somewhere.

    Finally the code looks like this:

        public static unsafe int IntersectSorted2(int[] arr1, byte[] arr2Check, int[] result)
        {
            int length;
    
            fixed (int* pArr1 = arr1, pResult = result)
            fixed (byte* pArr2Check = arr2Check)
            {
                int* maxArr1Adr = pArr1 + arr1.Length;
                int* arr1Value = pArr1;
                int* resultValue = pResult;
    
                while (arr1Value < maxArr1Adr)
                {
                    if (*(pArr2Check + *arr1Value) == 1)
                    {
                        *resultValue = *arr1Value;
                        resultValue++;
                    }
    
                    arr1Value++;
                }
    
                length = (int)(resultValue - pResult);
            }
    
            return length;
        }
    

    You can see the result array size is returned by the function, then you do what you wish(resize it, keep it). Obviously the result array has to have at least the minimum size of arr1 and arr2.

    The big improvement, is that I only iterate through the first array, which would be best to have less size than the second one, so you have less iterations. Less iterations means less CPU cycles right?

    So here is the really fast intersection of two ordered arrays, that if you need a reaaaaalllyy high performance ;).

提交回复
热议问题