Re: Finding a random position within a long string (Activeperl Build 822)
by kyle (Abbot) on Oct 12, 2007 at 21:29 UTC
|
my $string = 'ABC' x 1_000_000;
$string =~ s/(.)/( rand 10 < 1 ) ? '?' : $1/eg;
print $string;
I get: ABCABC?BC?BCABCABCABCA?CABCABCABCABCA?CABCABCABCABCAB?ABCABCABCABCABCABCABCABCA...
It's easy enough that I wonder if I'm misunderstanding your question.
Update: The full output has 298993 question marks and 2701007 other characters, but of course that varies with every execution. | [reply] [d/l] [select] |
|
s/(?(?{rand() >= 0.1})(?!))./?/g
Here's a benchmark comparing the two versions.
use strict;
use Benchmark qw(cmpthese timethese);
my $str = '?' x 100_000;
cmpthese(timethese(-10, {
assertion => sub {
$str =~ s/(?(?{rand() >= 0.1})(?!))./?/g;
},
eval => sub {
$str =~ s/(.)/rand() < 0.1 ? '?' : $1/eg;
},
}));
__END__
Benchmark: running assertion, eval for at least 10 CPU seconds...
assertion: 11 wallclock secs (10.09 usr + 0.01 sys = 10.10 CPU) @ 8
+.51/s (n=86)
eval: 11 wallclock secs (10.02 usr + 0.02 sys = 10.04 CPU) @ 3
+.09/s (n=31)
Rate eval assertion
eval 3.09/s -- -64%
assertion 8.51/s 176% --
lodin | [reply] [d/l] [select] |
|
| [reply] [d/l] |
Re: Finding a random position within a long string (Activeperl Build 822)
by FunkyMonk (Chancellor) on Oct 12, 2007 at 21:31 UTC
|
Depending on what you mean by "somehow equally distributed over the string", this may do the trick
my $percent = 10;
my $string = 'A' x 20;
for ( 1 .. $percent / 100 * length $string ) {
my $pos = int rand length $string;
substr( $string, $pos, 1 ) = '?';
}
Update: int isn't needed
| [reply] [d/l] [select] |
|
FunkyMonk: this may do the trick
try:
my $percent = 10;
my $string = 'A' x 1_000_000;
for ( 1 .. $percent / 100 * length $string ) {
my $pos = int rand length $string;
substr( $string, $pos, 1 ) = '?';
}
print $string =~ y/?//;
Sth. like this is the approach I initially took (and failed).
If you check that out (with a larger string), youll see the print-
out from the last line will approach 0xffff.
That means (I'm struggling to say this)
In Activeperl/Win, the RAND_MAX of
the underlying clib is promoted
into perl?
Is this documented?
Regards & Thanks
mwa
| [reply] [d/l] |
|
| [reply] [d/l] |
|
I took "about ~10%" to mean about 10% ;)
| [reply] [d/l] [select] |
Re: Finding a random position within a long string (Activeperl Build 822)
by BrowserUk (Patriarch) on Oct 12, 2007 at 22:19 UTC
|
Try applying your best tests to this (and let me know how it fairs?):
Updated: String is 3 * 1e6!
#! perl -slw
use strict;
use Math::Random::MT qw[ rand ];
sub shuffle {
return unless defined wantarray;
my $ref = @_ == 1 ? $_[ 0 ] : [ @_ ];
for( 0 .. $#$ref ) {
my $p = $_ + rand( @{ $ref } - $_ );
@{ $ref }[ $_, $p ] = @{ $ref }[ $p, $_ ];
}
return wantarray ? @{ $ref } : $ref;
}
my $string = 'abc' x 1e6;
my $len = length $string;
my @picks = ( shuffle 0 .. $len-1 )[ 0 .. $len * 0.1 ];
substr $string, $_, 1, '?' for @picks;
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
Re: Finding a random position within a long string (Activeperl Build 822)
by johngg (Canon) on Oct 12, 2007 at 22:40 UTC
|
Here's another way that builds an array of random positions in which to place a '?' and then makes the substitutions using an array slice. Downside is the spliting and rejoining of the string.
use strict;
use warnings;
my $str = q{ABC} x 10_000;
my @chars = split m{}, $str;
my @randPosns = grep { rand 10 < 1 } 0 .. length($str) - 1;
@chars[@randPosns] = (q{?}) x @randPosns;
my $newStr = join q{}, @chars;
print $newStr;
Cheers, JohnGG | [reply] [d/l] [select] |
Re: Finding a random position within a long string (Activeperl Build 822)
by snopal (Pilgrim) on Oct 12, 2007 at 21:26 UTC
|
It seems to me that you can not have "random" and "equally distributed" without knowing the sampling size after the process. Random is one thing, handling a equal distribution randomly is a whole 'nother ball of wax.
| [reply] |
|
snopal: you can not have "random" and "equally distributed" without knowing the sampling size after the process
As I see this, if you take samples of length M = k * W (from
the whole string of length W), then the mean ratio of the property
in question ('?' to non '?') in each M-sample should approach the
expected ratio R (e.g. 0.1), if the number of samples taken
is large enough. (The variance of the property in equally sized
M-samples shouldn't depend on their "position" within the whole ensemble.)
But maybe I didn't understand your objection correctly?
Regards
mwa
| [reply] |
|
By your statement, there is no distribution restriction "if the number of samples taken is large enough." By placing all 10% of your '?' at the beginning and doing a tremendous number of samples, your "data full of ?" or "data of no ?" should even out to 10%. You just have to take a large number of samples.
Since neither the sample size, nor the number of samples is defined, it could just as easily be "one sample of ten percent plus one", the result of which would be that virtually every random solution would fail.
This looks like homework. The level of sophistication of your request changed dramatically from the original problem statement to the response you gave me.
| [reply] |