Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Hmm, that's not quite right. You've not eliminated the chance of duplicate values in the output. You want something more like:
{ my %selected_set; my $choose_one = sub { $selected_set{ @input[rand @input] } = 1) } +; $choose_one->() while keys %selected_set < $choose_count; my @selected = keys %selected_set; }
The problem of non-termination is indeed something that will bite you when you least expect it. I believe it can only be solved probabilistically in the absence of a complete scan of the input, either by shuffle or by calculating a histogram of the set of input values somehow. In a workflow situation I'd probably try to get the histogram precalculated for me, and then you can actually use the numerical weights to make your selection, since this scales better to large weights than duplicating input.

So in the absence of that kind of knowledge, I see two ways of reducing the probability of a hang. First way is to use a shuffle for small datasets and random selection for large datasets, where small/large division can be arbitrary, or determined dynamically by scanning the front of the dataset to make sure there are "enough" different values.

The second probabilistic method is to count how many times you've made a random selection, and give up if the number of attempts far outweighs the number of desired values (and maybe print a warning, so you know why your program now takes five seconds to run instead of five microseconds). But running for five seconds and producing some output is a lot better than running forever and producing no output.


In reply to Re^2: removing the goto by TimToady
in thread removing the goto by scoobyrico

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-03-29 08:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found