Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: Opening random files (with bias) based on File::Stat information.

by xdg (Monsignor)
on Mar 19, 2006 at 14:05 UTC ( #537744=note: print w/replies, xml ) Need Help??

in reply to Opening random files (with bias) based on File::Stat information.

For the bias, something you might consider is this:

  • Sort files by newest to oldest
  • Generate a random number between 0 and 1
  • Invert that number against a bounded cumulative probability distribution function
  • Scale the inverse to the length of your list
  • Pick a file using the scaled inverse as the index

If you pick a distribution that is weighted towards 0, you'll wind up picking newer files. Note: this isn't technically weighting by time -- it's biasing towards certain array slots, irrespective of whether those slots are close in access time or far apart. However, that may be sufficient for your particular application.

A good distribution for this may be the Kumaraswamy, which is bounded between 0 and 1 and has a closed form that is easy to invert. By changing the two input parameters, you'll get different shapes, including ones that bias towards 0. (You'll have to try graphing some PDF's and see what you like.)

Here's an example of how it could be used to bias in the way I described:

use strict; use warnings; my $param_a = 1.5; my $param_b = 6; my @array = ( 1 .. 100 ); sub invK { my ($F, $Ka, $Kb) = @_; return ( 1 - ( 1 - $F )**( 1 / $Kb ) )**( 1 / $Ka ); } for ( 1 .. 20 ) { my $pick = int( invK( rand(), $param_a, $param_b ) * @array ); print "$pick\n"; }

A test run gave this: 7 11 12 26 18 27 10 30 6 3 28 2 35 7 29 40 26 15 3 44


Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

  • Comment on Re: Opening random files (with bias) based on File::Stat information.
  • Download Code

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://537744]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2021-01-15 16:50 GMT
Find Nodes?
    Voting Booth?