Finding a random position within a long string (Activeperl Build 822)

mwah has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Finding a random position within a long string (Activeperl Build 822) by kyle (Abbot) on Oct 12, 2007 at 21:29 UTC
Seems straight forward enough... `my $string = 'ABC' x 1_000_000; $string =~ s/(.)/( rand 10 < 1 ) ? '?' : $1/eg; print $string;` [download] I get: `ABCABC?BC?BCABCABCABCA?CABCABCABCABCA?CABCABCABCABCAB?ABCABCABCABCABCABCABCABCA`... It's easy enough that I wonder if I'm misunderstanding your question. Update: The full output has 298993 question marks and 2701007 other characters, but of course that varies with every execution.	[reply] [d/l] [select]
Re^2: Finding a random position within a long string (Activeperl Build 822) by lodin (Hermit) on Oct 12, 2007 at 23:57 UTC
You have a really nice Perlish solution. It's unnecessary to replace most of the string with itself though, if you can stand using an experimental assertion. The substitution would then look like this: `s/(?(?{rand() >= 0.1})(?!))./?/g` [download] Here's a benchmark comparing the two versions. use strict; use Benchmark qw(cmpthese timethese); my $str = '?' x 100_000; cmpthese(timethese(-10, { assertion => sub { $str =~ s/(?(?{rand() >= 0.1})(?!))./?/g; }, eval => sub { $str =~ s/(.)/rand() < 0.1 ? '?' : $1/eg; }, })); __END__ Benchmark: running assertion, eval for at least 10 CPU seconds... assertion: 11 wallclock secs (10.09 usr + 0.01 sys = 10.10 CPU) @ 8 +.51/s (n=86) eval: 11 wallclock secs (10.02 usr + 0.02 sys = 10.04 CPU) @ 3 +.09/s (n=31) Rate eval assertion eval 3.09/s -- -64% assertion 8.51/s 176% -- [download] lodin	[reply] [d/l] [select]
Re^2: Finding a random position within a long string (Activeperl Build 822) by mwah (Hermit) on Oct 12, 2007 at 21:40 UTC
Your solution: `$string =~ s/(.)/( rand 10 < 1 ) ? '?' : $1/eg;` [download] is very nice. This one does really work (in Win32/Activeperl). Thanks & Regards mwa	[reply] [d/l]
Re: Finding a random position within a long string (Activeperl Build 822) by FunkyMonk (Chancellor) on Oct 12, 2007 at 21:31 UTC
Depending on what you mean by "somehow equally distributed over the string", this may do the trick `my $percent = 10; my $string = 'A' x 20; for ( 1 .. $percent / 100 * length $string ) { my $pos = int rand length $string; substr( $string, $pos, 1 ) = '?'; }` [download] Update: int isn't needed	[reply] [d/l] [select]
Re^2: Finding a random position within a long string (Activeperl Build 822) by mwah (Hermit) on Oct 12, 2007 at 21:47 UTC
FunkyMonk: this may do the trick try: `my $percent = 10; my $string = 'A' x 1_000_000; for ( 1 .. $percent / 100 * length $string ) { my $pos = int rand length $string; substr( $string, $pos, 1 ) = '?'; } print $string =~ y/?//;` [download] Sth. like this is the approach I initially took (and failed). If you check that out (with a larger string), youll see the print- out from the last line will approach 0xffff. That means (I'm struggling to say this) In Activeperl/Win, the RAND_MAX of the underlying clib is promoted into perl? Is this documented? Regards & Thanks mwa	[reply] [d/l]
Re^2: Finding a random position within a long string (Activeperl Build 822) by kyle (Abbot) on Oct 12, 2007 at 21:39 UTC
This might get a little fewer than the percentage requested because it's possible for it to pick the same position more than once. Try setting `$percent = 90` for a good demonstration. I ran it ten times, and it never replaced more than 13/20 (well short of the expected 18/20).	[reply] [d/l]
Re^3: Finding a random position within a long string (Activeperl Build 822) by FunkyMonk (Chancellor) on Oct 12, 2007 at 21:42 UTC
I took "about ~10%" to mean about 10% ;)	[reply] [d/l] [select]
Re: Finding a random position within a long string (Activeperl Build 822) by BrowserUk (Patriarch) on Oct 12, 2007 at 22:19 UTC
Try applying your best tests to this (and let me know how it fairs?): Updated: String is 3 * 1e6! `#! perl -slw use strict; use Math::Random::MT qw[ rand ]; sub shuffle { return unless defined wantarray; my $ref = @_ == 1 ? $_[ 0 ] : [ @_ ]; for( 0 .. $#$ref ) { my $p = $_ + rand( @{ $ref } - $_ ); @{ $ref }[ $_, $p ] = @{ $ref }[ $p, $_ ]; } return wantarray ? @{ $ref } : $ref; } my $string = 'abc' x 1e6; my $len = length $string; my @picks = ( shuffle 0 .. $len-1 )[ 0 .. $len * 0.1 ]; substr $string, $_, 1, '?' for @picks;` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re: Finding a random position within a long string (Activeperl Build 822) by johngg (Canon) on Oct 12, 2007 at 22:40 UTC
Here's another way that builds an array of random positions in which to place a '?' and then makes the substitutions using an array slice. Downside is the `split`ing and re`join`ing of the string. `use strict; use warnings; my $str = q{ABC} x 10_000; my @chars = split m{}, $str; my @randPosns = grep { rand 10 < 1 } 0 .. length($str) - 1; @chars[@randPosns] = (q{?}) x @randPosns; my $newStr = join q{}, @chars; print $newStr;` [download] Cheers, JohnGG	[reply] [d/l] [select]
Re: Finding a random position within a long string (Activeperl Build 822) by snopal (Pilgrim) on Oct 12, 2007 at 21:26 UTC
It seems to me that you can not have "random" and "equally distributed" without knowing the sampling size after the process. Random is one thing, handling a equal distribution randomly is a whole 'nother ball of wax.	[reply]
Re^2: Finding a random position within a long string (Activeperl Build 822) by mwah (Hermit) on Oct 12, 2007 at 22:39 UTC
snopal: you can not have "random" and "equally distributed" without knowing the sampling size after the process As I see this, if you take samples of length M = k * W (from the whole string of length W), then the mean ratio of the property in question ('?' to non '?') in each M-sample should approach the expected ratio R (e.g. 0.1), if the number of samples taken is large enough. (The variance of the property in equally sized M-samples shouldn't depend on their "position" within the whole ensemble.) But maybe I didn't understand your objection correctly? Regards mwa	[reply]
Re^3: Finding a random position within a long string (Activeperl Build 822) by snopal (Pilgrim) on Oct 13, 2007 at 14:03 UTC
By your statement, there is no distribution restriction "if the number of samples taken is large enough." By placing all 10% of your '?' at the beginning and doing a tremendous number of samples, your "data full of ?" or "data of no ?" should even out to 10%. You just have to take a large number of samples. Since neither the sample size, nor the number of samples is defined, it could just as easily be "one sample of ten percent plus one", the result of which would be that virtually every random solution would fail. This looks like homework. The level of sophistication of your request changed dramatically from the original problem statement to the response you gave me.	[reply]


"be consistent"
	PerlMonks