Re^2: Opening random files then reading random lines from file.

Exactly how this works is complicated.

No, it isn't. How it works is as follows: we read a file, line by line. For the kth line read, we decide with probability 1/k whether we remember the line, forgetting any previous one.

That's fairly trivial. It takes just a little analysis to see it's fair. Fair means that any line will be the final pick. So, what's the probability of line k to be the final pick? First what needs to be happen is that it gets remembered when the line is read, which has a probability of 1/k. Then, any next lines must *not* be picked. For each of the next lines, this happens with probability (j -1)/j, where j is the line number. Since all those probabilities are independent of each other, we can multiply them:

           n
        -------
    1    |   |    j-1        1     k      1 
   ---   |   |    ---   ==  --- * --- == ---
    k    |   |     j         k     n      n
         j=k+1

Too bad we can't really do math in HTML.

Comment on Re^2: Opening random files then reading random lines from file.

Replies are listed 'Best First'.
Re^3: Opening random files then reading random lines from file. by Marshall (Canon) on Apr 27, 2012 at 11:56 UTC
I think you made my point that it is "complicated" (not easily explained and understood)..	[reply]
Re^4: Opening random files then reading random lines from file. by JavaFan (Canon) on Apr 27, 2012 at 13:10 UTC
You and I must have a wildly different idea what is complicated. If you find a one-liner using a bog standard file iteration loop complicating, I wonder what you think of `sort`.	[reply] [d/l]
Re^5: Opening random files then reading random lines from file. by SuicideJunkie (Vicar) on Apr 27, 2012 at 13:51 UTC
`1/k` is certainly not complicated in the least. The proof that it is correct is trivial for a proof. The only part I can see that could be called complex is the fact that the algorithm is not immediately obvious and needs a proof. (Even if just an engineering "proof"-by-example.)	[reply] [d/l]
Re^6: Opening random files then reading random lines from file. by JavaFan (Canon) on Apr 27, 2012 at 15:46 UTC


Don't ask to ask, just ask
	PerlMonks