Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Selecting random records from an array

by Anonymous Monk
on Jun 03, 2003 at 21:15 UTC ( [id://262794]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I need to select 75% of the records in an array at random. I was thinking of using some for loops combined with rand but, does anyone know a quick and better way of doing this?

Thanks,
Ralph.

Replies are listed 'Best First'.
Re: Selecting random records from an array
by Enlil (Parson) on Jun 03, 2003 at 21:20 UTC
    If you have 5.8 or later installed the following should work:
    use strict; use warnings; use List::Util 'shuffle'; my @big = (qw(u n s o r t e d)); my @shuffle = (shuffle(@big))[0 .. $#big*.75]; print join "|",@shuffle;
    if not you might want to glance at 'perldoc -q shuffle' for other ways to do this.

    -enlil

      It's too bad that they didn't bother to implement shuffle() so that it could efficiently stop early. But "worse is better", no?

      (tye)Re: Random Picks shows how. Other notes in that thread show alternate solutions.

                      - tye
•Re: Selecting random records from an array
by merlyn (Sage) on Jun 03, 2003 at 22:01 UTC

      my @subset = rand(75) <= 100, @input;

      I think it works better with grep and the values swapped :)

      my @subset = grep rand(100) < 75, @input;
      Or just:
      my @subset = grep rand() < .75, @input;

      Perhaps I misinterpreted your code, but in that case I have no idea what it is supposed to do.

      Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        He is creating a throttle or valve with the grep function, allowing only a certain stochastic percentage of @input applicants through the valve into the accepted @subset.

        Think of it as a lottery: anybody who flips a coin twice and gets a "heads" in either flip will win. That could be everyone, it could be nobody, but the odds are that about 75% of the players will win.

        --
        [ e d @ h a l l e y . c c ]

Re: Selecting random records from an array
by I0 (Priest) on Jun 04, 2003 at 01:35 UTC
    my $n = @input*.75; my $m = @input; my @subset = grep rand($m--)<$n?$n--:0, @input;
Re: Selecting random records from an array
by blueAdept (Beadle) on Jun 04, 2003 at 21:31 UTC
    How about this? This seemed simple & quick, perhaps not the most robust/efficient. Any comments are welcome.

    @foo = qw( a b c d e f g h i j k l m n o p q r s t u v w y z ); my @feh = sort { rand(1) >= .5 } @foo; # randomly sort my $upper = int( scalar(@feh) * .75 ); # upper index to get 75% of re +cords print join (",", @feh[ 0 .. $upper]); #Example output: #C:\>perl randsort.pl #q,r,p,j,o,d,a,b,c,g,e,h,f,i,l,k,m,n,s #C:\>perl randsort.pl #t,g,r,s,q,o,p,b,h,e,f,a,c,d,m,n,l,k,i
      That's not a valid sort routine, as "perldoc -f sort" says:

      The comparison function is required to behave. If it returns inconsistent results (sometimes saying $x[1] is less than $x[2] and sometimes saying the opposite, for example) the results are not well- defined.

        ... the results are not well- defined.

        Isn't that exactly why it works (for some definition of the term:)? The whole point of making a random selection is to achieve a "not-well defined" result?

        That said, this 'desort' method of shuffling an array doesn't stand up to analysis for another reason. Statistically, the results are extremely biased as I showed here. Note the abysmal standard deviation of this method (labelled 'qsort', as it was done under 5.6).

        Might be interesting to see what sort[sic] of results you would get using the default mergesort in 5.8.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


        You bring up a great point. Its a new day, and giving it another look I'd imagine that using rand() within a sort algorythm might be bad -- perhaps cause infinite looping because the comparsion between two items keeps changing. Apparently it doesn't though, but perhaps the default sort might with a larger set, or might change in the future. How's this?

        I guess I should only do the rand once and memorize the result.

        my @foo = qw( a b c d e f g h i j k l m n o p q r s t u v w y z ); my %comp; my @feh = sort { if (! defined $comp{$a} ) { if (rand(1) >= .5) { $comp{$a} = 1 } else { $comp{$a} = 0; } } $comp{$a}; } @foo; my $upper = int( scalar(@feh) * .75 ); print join (",", @feh[ 0 .. $upper]); #C:\>perl randsort.pl (output) #b,a,d,c,f,e,h,g,i,j,k,l,m,n,r,q,p,o,s

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://262794]
Approved by Mr. Muskrat
Front-paged by tye
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-04-26 07:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found