Selecting random records from an array

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Selecting random records from an array by Enlil (Parson) on Jun 03, 2003 at 21:20 UTC
If you have 5.8 or later installed the following should work: `use strict; use warnings; use List::Util 'shuffle'; my @big = (qw(u n s o r t e d)); my @shuffle = (shuffle(@big))[0 .. $#big*.75]; print join "\|",@shuffle;` [download] if not you might want to glance at 'perldoc -q shuffle' for other ways to do this. -enlil	[reply] [d/l]
Re^2: Selecting random records from an array (worse) by tye (Sage) on Jun 03, 2003 at 23:37 UTC
It's too bad that they didn't bother to implement shuffle() so that it could efficiently stop early. But "worse is better", no? (tye)Re: Random Picks shows how. Other notes in that thread show alternate solutions. - tye	[reply]
•Re: Selecting random records from an array by merlyn (Sage) on Jun 03, 2003 at 22:01 UTC
If it has to be precisely 75% of the count, then other solutions in this thread are suitable. But if you just want "about 75%" of the data, this'll work: `my @subset = grep rand(100) <= 75, @input;` [download] -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply] [d/l]
Re: •Re: Selecting random records from an array by Juerd (Abbot) on Jun 03, 2003 at 22:25 UTC
my @subset = rand(75) <= 100, @input; I think it works better with grep and the values swapped :) `my @subset = grep rand(100) < 75, @input;` [download] Or just: `my @subset = grep rand() < .75, @input;` [download] Perhaps I misinterpreted your code, but in that case I have no idea what it is supposed to do. Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }	[reply] [d/l] [select]
Re: Re: •Re: Selecting random records from an array by halley (Prior) on Jun 04, 2003 at 13:48 UTC
He is creating a throttle or valve with the grep function, allowing only a certain stochastic percentage of @input applicants through the valve into the accepted @subset. Think of it as a lottery: anybody who flips a coin twice and gets a "heads" in either flip will win. That could be everyone, it could be nobody, but the odds are that about 75% of the players will win. -- `[ e d @ h a l l e y . c c ]`	[reply]
Re: Re: Re: •Re: Selecting random records from an array by Juerd (Abbot) on Jun 04, 2003 at 15:38 UTC
Re: Selecting random records from an array by I0 (Priest) on Jun 04, 2003 at 01:35 UTC
`my $n = @input*.75; my $m = @input; my @subset = grep rand($m--)<$n?$n--:0, @input;` [download]	[reply] [d/l]
Re: Selecting random records from an array by blueAdept (Beadle) on Jun 04, 2003 at 21:31 UTC
How about this? This seemed simple & quick, perhaps not the most robust/efficient. Any comments are welcome. `@foo = qw( a b c d e f g h i j k l m n o p q r s t u v w y z ); my @feh = sort { rand(1) >= .5 } @foo; # randomly sort my $upper = int( scalar(@feh) * .75 ); # upper index to get 75% of re +cords print join (",", @feh[ 0 .. $upper]); #Example output: #C:\>perl randsort.pl #q,r,p,j,o,d,a,b,c,g,e,h,f,i,l,k,m,n,s #C:\>perl randsort.pl #t,g,r,s,q,o,p,b,h,e,f,a,c,d,m,n,l,k,i` [download]	[reply] [d/l]
Re: Re: Selecting random records from an array by tall_man (Parson) on Jun 04, 2003 at 22:09 UTC
That's not a valid sort routine, as "perldoc -f sort" says: The comparison function is required to behave. If it returns inconsistent results (sometimes saying `$x[1]` is less than `$x[2]` and sometimes saying the opposite, for example) the results are not well- defined.	[reply] [d/l] [select]
Re: Re: Re: Selecting random records from an array by BrowserUk (Patriarch) on Jun 04, 2003 at 22:48 UTC
... the results are not well- defined. Isn't that exactly why it works (for some definition of the term:)? The whole point of making a random selection is to achieve a "not-well defined" result? That said, this 'desort' method of shuffling an array doesn't stand up to analysis for another reason. Statistically, the results are extremely biased as I showed here. Note the abysmal standard deviation of this method (labelled 'qsort', as it was done under 5.6). Might be interesting to see what sort[sic] of results you would get using the default mergesort in 5.8. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller	[reply]
Re: Re: Re: Selecting random records from an array by blueAdept (Beadle) on Jun 05, 2003 at 13:44 UTC
You bring up a great point. Its a new day, and giving it another look I'd imagine that using rand() within a sort algorythm might be bad -- perhaps cause infinite looping because the comparsion between two items keeps changing. Apparently it doesn't though, but perhaps the default sort might with a larger set, or might change in the future. How's this? I guess I should only do the rand once and memorize the result. `my @foo = qw( a b c d e f g h i j k l m n o p q r s t u v w y z ); my %comp; my @feh = sort { if (! defined $comp{$a} ) { if (rand(1) >= .5) { $comp{$a} = 1 } else { $comp{$a} = 0; } } $comp{$a}; } @foo; my $upper = int( scalar(@feh) * .75 ); print join (",", @feh[ 0 .. $upper]); #C:\>perl randsort.pl (output) #b,a,d,c,f,e,h,g,i,j,k,l,m,n,r,q,p,o,s` [download]	[reply] [d/l]


Keep It Simple, Stupid
	PerlMonks