Efficiently pick n random elements from PHP array (without shuffle)

前端 未结 5 1962
独厮守ぢ
独厮守ぢ 2020-11-29 12:05

I have the following code to pick $n elements from an array $array in PHP:

shuffle($array);
$result = array_splice($array, 0, $n);
         


        
相关标签:
5条回答
  • 2020-11-29 12:41

    This will only show benifits for small n compared to an array shuffle, but you could

    1. Choose a random index r n times, each time decreasing the limit by 1
    2. Adjust for previously used indices
    3. Take value
    4. Store used index

    Pseudocode

    arr = []
    used = []
    for i = 0..n-1:
        r = rand 0..len-i
        d = 0
        for j = 0..used.length-1:
            if r >= used[j]:
                d += 1
        arr.append($array[r + d])
        used.append(r)
    return arr
    
    0 讨论(0)
  • 2020-11-29 12:52

    The trick is to use a variation of shuffle or in other words a partial shuffle.

    performance is not the only criterion, statistical efficiency, i.e unbiased sampling is as important (as the original shuffle solution is)

    function random_pick( $a, $n ) 
    {
      $N = count($a);
      $n = min($n, $N);
      $picked = array_fill(0, $n, 0); $backup = array_fill(0, $n, 0);
      // partially shuffle the array, and generate unbiased selection simultaneously
      // this is a variation on fisher-yates-knuth shuffle
      for ($i=0; $i<$n; $i++) // O(n) times
      { 
        $selected = mt_rand( 0, --$N ); // unbiased sampling N * N-1 * N-2 * .. * N-n+1
        $value = $a[ $selected ];
        $a[ $selected ] = $a[ $N ];
        $a[ $N ] = $value;
        $backup[ $i ] = $selected;
        $picked[ $i ] = $value;
      }
      // restore partially shuffled input array from backup
      // optional step, if needed it can be ignored, e.g $a is passed by value, hence copied
      for ($i=$n-1; $i>=0; $i--) // O(n) times
      { 
        $selected = $backup[ $i ];
        $value = $a[ $N ];
        $a[ $N ] = $a[ $selected ];
        $a[ $selected ] = $value;
        $N++;
      }
      return $picked;
    }
    

    NOTE the algorithm is strictly O(n) in both time and space, produces unbiased selections (it is a partial unbiased shuffling) and produces output which is proper array with consecutive keys (not needing extra array_values etc..)

    Use example:

    $randomly_picked = random_pick($my_array, 5);
    // or if an associative array is used
    $randomly_picked_keys = random_pick(array_keys($my_array), 5);
    $randomly_picked = array_intersect_key($my_array, array_flip($randomly_picked_keys));
    

    For further variations and extensions of shuffling for PHP:

    1. PHP - shuffle only part of an array
    2. PHP shuffle with seed
    3. How can I take n elements at random from a Perl array?
    0 讨论(0)
  • 2020-11-29 12:58

    You could generate n-times a random number with mt_rand() and then fill these values in a new array. To go against the case where the same index gets returned twice we use the actual returned index to fill the new array and check always if the index exists in the new array, if so we use while to loop through it as long as we get a duplicate index. At the end we use array_values() to get a 0-indexed array.

    $count = count($array) - 1;
    $new_array = array();
    for($i = 0; $i < $n; $i++) {
        $index = mt_rand(0, $count);
        while(isset($new_array[$index])) {
            $index = mt_rand(0, $count);
        }
    
        $new_array[$index] = $array[$index];
    }
    $new_array = array_values($new_array);
    
    0 讨论(0)
  • 2020-11-29 13:04

    This function performs a shuffle on only $n elements where $n is the number of random elements you want to pick. It will also work on associative arrays and sparse arrays. $array is the array to work on and $n is the number of random elements to retrieve.

    If we define the $max_index as count($array) - 1 - $iteration.

    It works by generating a random number between 0 and $max_index. Picking the key at that index, and replacing its index with the value at $max_index so that it can never be picked again, as $max_index will be one less at the next iteration and unreachable.

    In summary this is the Richard Durstenfeld's Fisher-Yates shuffle but operating only on $n elements instead of the entire array.

    function rand_pluck($array, $n) {
        $array_keys = array_keys($array);
        $array_length = count($array_keys);
        $max_index = $array_length -1;
        $iterations = min($n, $array_length);
        $random_array = array();
        while($iterations--) {
            $index = mt_rand(0, $max_index);
            $value = $array_keys[$index];
            $array_keys[$index] = $array_keys[$max_index];
            array_push($random_array, $array[$value]);
            $max_index--;
        }
        return $random_array;
    }
    
    0 讨论(0)
  • 2020-11-29 13:07
    $randomArray = [];
    while (count($randomArray) < 5) {
      $randomKey = mt_rand(0, count($array)-1);
      $randomArray[$randomKey] = $array[$randomKey];
    }
    

    This will provide exactly 5 elements with no duplicates and very quickly. The keys will be preserved.

    Note: You'd have to make sure $array had 5 or more elements or add some sort of check to prevent an endless loop.

    0 讨论(0)
提交回复
热议问题