I have the following code to pick $n
elements from an array $array
in PHP:
shuffle($array);
$result = array_splice($array, 0, $n);
This will only show benifits for small n
compared to an array shuffle, but you could
r
n
times, each time decreasing the limit by 1
Pseudocode
arr = []
used = []
for i = 0..n-1:
r = rand 0..len-i
d = 0
for j = 0..used.length-1:
if r >= used[j]:
d += 1
arr.append($array[r + d])
used.append(r)
return arr
The trick is to use a variation of shuffle or in other words a partial shuffle.
performance is not the only criterion, statistical efficiency, i.e unbiased sampling is as important (as the original shuffle
solution is)
function random_pick( $a, $n )
{
$N = count($a);
$n = min($n, $N);
$picked = array_fill(0, $n, 0); $backup = array_fill(0, $n, 0);
// partially shuffle the array, and generate unbiased selection simultaneously
// this is a variation on fisher-yates-knuth shuffle
for ($i=0; $i<$n; $i++) // O(n) times
{
$selected = mt_rand( 0, --$N ); // unbiased sampling N * N-1 * N-2 * .. * N-n+1
$value = $a[ $selected ];
$a[ $selected ] = $a[ $N ];
$a[ $N ] = $value;
$backup[ $i ] = $selected;
$picked[ $i ] = $value;
}
// restore partially shuffled input array from backup
// optional step, if needed it can be ignored, e.g $a is passed by value, hence copied
for ($i=$n-1; $i>=0; $i--) // O(n) times
{
$selected = $backup[ $i ];
$value = $a[ $N ];
$a[ $N ] = $a[ $selected ];
$a[ $selected ] = $value;
$N++;
}
return $picked;
}
NOTE the algorithm is strictly O(n)
in both time and space, produces unbiased selections (it is a partial unbiased shuffling) and produces output which is proper array with consecutive keys (not needing extra array_values
etc..)
Use example:
$randomly_picked = random_pick($my_array, 5);
// or if an associative array is used
$randomly_picked_keys = random_pick(array_keys($my_array), 5);
$randomly_picked = array_intersect_key($my_array, array_flip($randomly_picked_keys));
For further variations and extensions of shuffling for PHP:
You could generate n-times a random number with mt_rand()
and then fill these values in a new array. To go against the case where the same index gets returned twice we use the actual returned index to fill the new array and check always if the index exists in the new array, if so we use while to loop through it as long as we get a duplicate index. At the end we use array_values()
to get a 0-indexed array.
$count = count($array) - 1;
$new_array = array();
for($i = 0; $i < $n; $i++) {
$index = mt_rand(0, $count);
while(isset($new_array[$index])) {
$index = mt_rand(0, $count);
}
$new_array[$index] = $array[$index];
}
$new_array = array_values($new_array);
This function performs a shuffle on only $n
elements where $n
is the number of random elements you want to pick. It will also work on associative arrays and sparse arrays. $array
is the array to work on and $n
is the number of random elements to retrieve.
If we define the $max_index
as count($array) - 1 - $iteration
.
It works by generating a random number between 0 and $max_index
. Picking the key at that index, and replacing its index with the value at $max_index
so that it can never be picked again, as $max_index
will be one less at the next iteration and unreachable.
In summary this is the Richard Durstenfeld's Fisher-Yates shuffle but operating only on $n
elements instead of the entire array.
function rand_pluck($array, $n) {
$array_keys = array_keys($array);
$array_length = count($array_keys);
$max_index = $array_length -1;
$iterations = min($n, $array_length);
$random_array = array();
while($iterations--) {
$index = mt_rand(0, $max_index);
$value = $array_keys[$index];
$array_keys[$index] = $array_keys[$max_index];
array_push($random_array, $array[$value]);
$max_index--;
}
return $random_array;
}
$randomArray = [];
while (count($randomArray) < 5) {
$randomKey = mt_rand(0, count($array)-1);
$randomArray[$randomKey] = $array[$randomKey];
}
This will provide exactly 5 elements with no duplicates and very quickly. The keys will be preserved.
Note: You'd have to make sure $array had 5 or more elements or add some sort of check to prevent an endless loop.