Optimizing array merge operation

后端 未结 3 1094
耶瑟儿~
耶瑟儿~ 2021-01-19 09:52

I would appreciate any help, given.

I have 7 separate arrays with approx. 90,000 numbers in each array (let\'s call them arrays1-arrays7). There are no duplicate num

相关标签:
3条回答
  • 2021-01-19 10:47

    array_merge() is significantly slower with more elements in the array because (from php.net):

    If the input arrays have the same string keys, then the later value for that key will overwrite the previous one. If, however, the arrays contain numeric keys, the later value will not overwrite the original value, but will be appended.

    Values in the input array with numeric keys will be renumbered with incrementing keys starting from zero in the result array.

    So this function is actually making some conditional statements. You can replace array merge with normal adding, consisting of the loop (foreach or any other) and the [] operator. You can write a function imitating array_merge, like(using reference to not copy the array..):

    function imitateMerge(&$array1, &$array2) {
        foreach($array2 as $i) {
            $array1[] = $i;
        }
    }
    

    And you will see the increase of speed really hard.

    0 讨论(0)
  • 2021-01-19 10:54

    I suggest replacing

    foreach($result as $key => $val)
    {
        if($result[$key]==3)
        {
        fwrite($fp_lines, $key."\r\n");
        }
    }
    

    With something like

    $res = array_keys(array_filter($result, function($val){return $val == 3;}));
    fwrite($fp_lines, implode("\r\n", $res));
    
    0 讨论(0)
  • 2021-01-19 10:58

    This is probably all wrong, look at last edit

    I also think array_merge is the problem, but my suggestion would be to implement a function counting the values in several array directly instead of merging first. This depends a little bit on how much overlap you have in your arrays. If the overlap is very small then this might not be much faster then merging, but with significant overlap (rand(0, 200000) to fill arrays when i tried) this will be much faster.

    function arrValues($arrs) {
        $values = array();
    
        foreach($arrs as $arr) {
            foreach($arr as $key => $val) {
                if(array_key_exists($key, $values)) {
                    $values[$val]++;
                } else {
                    $values[$val] = 1;
                }
            }
        }
        return $values;
    }
    
    var_dump(arrValues(array
        ($files_array1
        ,$files_array2
        ,$files_array3
        ,$files_array4
        ,$files_array5
        ,$files_array6
        ,$files_array7
        )));
    

    The computation takes about 0.5s on my machine, then another 2s for printing the stuff.

    -edit-

    It's also not clear to me why you do the same thing 1000 times? Are the arrays different each time or something? Saying a bit about the reason might give people additional ideas...

    -edit again-

    After some more exploration I don't believe array_merge is at fault any more. You don't have enough overlap to benefit that much from counting everything directly. Have you investigated available memory on your machine? For me merging 7 arrays with 90k elements in each takes ~250M. If you have allowed php to use this much memory, which I assume you have since you do not get any allocation errors then maybe the problem is that the memory is simply not available and you get a lot of page faults? If this is not the problem then on what kind of machine and what php version are you using? I've tested your original code on 5.5 and 5.4 and fixing the memory issue it also runs in about 0.5s. That is per iteration mind you. Now if you do this a 1000 times in the same php script then it will take a while. Even more so considering that you allocate all this memory each time.

    I believe you really should consider putting stuff in a database. Given your numbers it seems you have ~500M rows in total. That is an awful lot to handle in php. A database makes it easy.

    0 讨论(0)
提交回复
热议问题