Trying to understand array_diff_uassoc optimization

后端 未结 2 1386
一向
一向 2021-01-18 22:31

It seems that arrays sorted before comparing each other inside array_diff_uassoc.

What is the benefit of this approach?

Test script

functio         


        
相关标签:
2条回答
  • 2021-01-18 22:57

    Theory

    Sorting allows for a few shortcuts to be made; for instance:

    A      | B
    -------+------
    1,2,3  | 4,5,6
    

    Each element of A will only be compared against B[0], because the other elements are known to be at least as big.

    Another example:

    A      | B
    -------+-------
    4,5,6  | 1,2,6
    

    In this case, the A[0] is compared against all elements of B, but A[1] and A[2] are compared against B[2] only.

    If any element of A is bigger than all elements in B you will get the worst performance.

    Practice

    While the above works well for the standard array_diff() or array_udiff(), once a key comparison function is used it will resort to O(n * m) performance because of this change while trying to fix this bug.

    The aforementioned bug describes how custom key comparison functions can cause unexpected results when used with arrays that have mixed keys (i.e. numeric and string key values). I personally feel that this should've been addressed via the documentation, because you would get equally strange results with ksort().

    0 讨论(0)
  • 2021-01-18 23:09

    Sorting algorithm didn't change in PHP 7. Elements are just passed in another order to the sorting algorithm for some performance improvements.

    Well, benefit could be an eventual faster execution. You really hit worst case when both arrays have completely other keys.

    Worst case complexity is twice sorting the arrays and then comparisons of each key of the two arrays. O(n*m + n * log(n) + m * log(m))

    Best case is twice sorting and then just as many comparisons as there are elements in the smaller array. O(min(m, n) + n * log(n) + m * log(m))

    In case of a match, you wouldn't have to compare against the full array again, but only from the key after the match on.

    But in current implementation, the sorting is just redundant. Implementation in php-src needs some improvement I think. There's no outright bug, but implementation is just bad. If you understand some C: http://lxr.php.net/xref/PHP_TRUNK/ext/standard/array.c#php_array_diff (Note that that function is called via php_array_diff(INTERNAL_FUNCTION_PARAM_PASSTHRU, DIFF_ASSOC, DIFF_COMP_DATA_INTERNAL, DIFF_COMP_KEY_USER); from array_diff_uassoc)

    0 讨论(0)
提交回复
热议问题