I have an array, A = [a1,a2,a3,...aP]
with size P
. I have to sample q
elements from array A.
I plan to use a loop with q
You may construct second array, boolean with size P and store true for picked numbers. And when the numer is picked, check second table; in case "true" you must pick next one.
From perldoc perlfaq4:
How do I shuffle an array randomly?
If you either have Perl 5.8.0 or later installed, or if you have Scalar-List-Utils 1.03 or later installed, you can say:
use List::Util 'shuffle'; @shuffled = shuffle(@list);
If not, you can use a Fisher-Yates shuffle.
sub fisher_yates_shuffle { my $deck = shift; # $deck is a reference to an array return unless @$deck; # must not be empty! my $i = @$deck; while (--$i) { my $j = int rand ($i+1); @$deck[$i,$j] = @$deck[$j,$i]; } } # shuffle my mpeg collection # my @mpeg = <audio/*/*.mp3>; fisher_yates_shuffle( \@mpeg ); # randomize @mpeg in place print @mpeg;
You could also use List::Gen:
my $gen = <1..10>;
print "$_\n" for $gen->pick(5); # prints five random numbers
The other answers all involve shuffling the array, which is O(n)
.
It means modifying the original array (destructive) or copying the original array (memory intensive).
The first way to make it more memory efficient is not to shuffle the original array but to shuffle an array of indexes.
# Shuffled list of indexes into @deck
my @shuffled_indexes = shuffle(0..$#deck);
# Get just N of them.
my @pick_indexes = @shuffled_indexes[ 0 .. $num_picks - 1 ];
# Pick cards from @deck
my @picks = @deck[ @pick_indexes ];
It is at least independent of the content of the @deck, but its still O(nlogn) performance and O(n) memory.
A more efficient algorithm (not necessarily faster, depends on now big your array is) is to look at each element of the array and decide if it's going to make it into the array. This is similar to how you select a random line from a file without reading the whole file into memory, each line has a 1/N chance of being picked where N is the line number. So the first line has a 1/1 chance (it's always picked). The next has a 1/2. Then 1/3 and so on. Each pick will overwrite the previous pick. This results in each line having a 1/total_lines chance.
You can work it out for yourself. A one line file has a 1/1 chance so the first one is always picked. A two line file... the first line has a 1/1 then a 1/2 chance of surviving, which is 1/2, and the second line has a 1/2 chance. For a three line file... the first line has a 1/1 chance of being picked, then a 1/2 * 2/3 chance of surviving which is 2/6 or 1/3. And so on.
The algorithm is O(n) for speed, it iterates through an unordered array once, and does not consume any more memory than is needed to store the picks.
With a little modification, this works for multiple picks. Instead of a 1/$position
chance, it's $picks_left / $position
. Each time a pick is successful, you decrement $picks_left. You work from the high position to the low one. Unlike before, you don't overwrite.
my $picks_left = $picks;
my $num_left = @$deck;
my @picks;
my $idx = 0;
while($picks_left > 0 ) { # when we have all our picks, stop
# random number from 0..$num_left-1
my $rand = int(rand($num_left));
# pick successful
if( $rand < $picks_left ) {
push @result, $deck->[$idx];
$picks_left--;
}
$num_left--;
$idx++;
}
This is how perl5i implements its pick method (coming next release).
To understand viscerally why this works, take the example of picking 2 from a 4 element list. Each should have a 1/2 chance of being picked.
1. (2 picks, 4 items): 2/4 = 1/2
Simple enough. Next element has a 1/2 chance that an element will already have been picked, in which case it's chances are 1/3. Otherwise its chances are 2/3. Doing the math...
2. (1 or 2 picks, 3 items): (1/3 * 1/2) + (2/3 * 1/2) = 3/6 = 1/2
Next has a 1/4 chance that both elements will already be picked (1/2 * 1/2), then it has no chance; 1/2 chance that only one will be picked, then it has 1/2; and the remaining 1/4 that no items will be picked in which case it's 2/2.
3. (0, 1 or 2 picks, 2 items): (0/2 * 1/4) + (1/2 * 2/4) + (2/2 * 1/4) = 2/8 + 1/4 = 1/2
Finally, for the last item, there's a 1/2 the previous took the last pick.
4. (0 or 1 pick, 1 items): (0/1 * 2/4) + (1/1 * 2/4) = 1/2
Not exactly a proof, but good for convincing yourself it works.
You can suse the Fisher-Yates shuffle algorithm to randomly permute your array and then use a slice of the first q elements. Here's code from PerlMonks:
# randomly permutate @array in place
sub fisher_yates_shuffle
{
my $array = shift;
my $i = @$array;
while ( --$i )
{
my $j = int rand( $i+1 );
@$array[$i,$j] = @$array[$j,$i];
}
}
fisher_yates_shuffle( \@array ); # permutes @array in place
You can probably optimize this by having the shuffle stop after it has q
random elements selected. (The way this is written, you'd want the last q elements.)