Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Finding a random position within a long string (Activeperl Build 822)

by mwah (Hermit)
on Oct 12, 2007 at 21:14 UTC ( [id://644539]=perlquestion: print w/replies, xml ) Need Help??

mwah has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

given a long string, like
my $string = 'ABC' x 1_000_000;
where I'd like to have "some" characters
changed (randomly) to '?', say about ~10%, which
are somehow equally distributed over the string.

This should work for a windows system (Activeperl).
(Please don't misunderstand this question.)

Regards & Thanks

mwa
  • Comment on Finding a random position within a long string (Activeperl Build 822)
  • Download Code

Replies are listed 'Best First'.
Re: Finding a random position within a long string (Activeperl Build 822)
by kyle (Abbot) on Oct 12, 2007 at 21:29 UTC

    Seems straight forward enough...

    my $string = 'ABC' x 1_000_000; $string =~ s/(.)/( rand 10 < 1 ) ? '?' : $1/eg; print $string;

    I get: ABCABC?BC?BCABCABCABCA?CABCABCABCABCA?CABCABCABCABCAB?ABCABCABCABCABCABCABCABCA...

    It's easy enough that I wonder if I'm misunderstanding your question.

    Update: The full output has 298993 question marks and 2701007 other characters, but of course that varies with every execution.

      You have a really nice Perlish solution. It's unnecessary to replace most of the string with itself though, if you can stand using an experimental assertion. The substitution would then look like this:

      s/(?(?{rand() >= 0.1})(?!))./?/g
      Here's a benchmark comparing the two versions.
      use strict; use Benchmark qw(cmpthese timethese); my $str = '?' x 100_000; cmpthese(timethese(-10, { assertion => sub { $str =~ s/(?(?{rand() >= 0.1})(?!))./?/g; }, eval => sub { $str =~ s/(.)/rand() < 0.1 ? '?' : $1/eg; }, })); __END__ Benchmark: running assertion, eval for at least 10 CPU seconds... assertion: 11 wallclock secs (10.09 usr + 0.01 sys = 10.10 CPU) @ 8 +.51/s (n=86) eval: 11 wallclock secs (10.02 usr + 0.02 sys = 10.04 CPU) @ 3 +.09/s (n=31) Rate eval assertion eval 3.09/s -- -64% assertion 8.51/s 176% --

      lodin

      Your solution:
      $string =~ s/(.)/( rand 10 < 1 ) ? '?' : $1/eg;
      is very nice. This one does really work (in Win32/Activeperl).

      Thanks & Regards

      mwa
Re: Finding a random position within a long string (Activeperl Build 822)
by FunkyMonk (Chancellor) on Oct 12, 2007 at 21:31 UTC
    Depending on what you mean by "somehow equally distributed over the string", this may do the trick
    my $percent = 10; my $string = 'A' x 20; for ( 1 .. $percent / 100 * length $string ) { my $pos = int rand length $string; substr( $string, $pos, 1 ) = '?'; }

    Update: int isn't needed

      FunkyMonkthis may do the trick

      try:
      my $percent = 10; my $string = 'A' x 1_000_000; for ( 1 .. $percent / 100 * length $string ) { my $pos = int rand length $string; substr( $string, $pos, 1 ) = '?'; } print $string =~ y/?//;
      Sth. like this is the approach I initially took (and failed).

      If you check that out (with a larger string), youll see the print-
      out from the last line will approach 0xffff.

      That means (I'm struggling to say this)
         In Activeperl/Win, the RAND_MAX of 
         the underlying clib is promoted 
         into perl?
      
      Is this documented?

      Regards & Thanks

      mwa

      This might get a little fewer than the percentage requested because it's possible for it to pick the same position more than once. Try setting $percent = 90 for a good demonstration. I ran it ten times, and it never replaced more than 13/20 (well short of the expected 18/20).

        I took "about ~10%" to mean about 10% ;)

Re: Finding a random position within a long string (Activeperl Build 822)
by BrowserUk (Patriarch) on Oct 12, 2007 at 22:19 UTC

    Try applying your best tests to this (and let me know how it fairs?):

    Updated: String is 3 * 1e6!

    #! perl -slw use strict; use Math::Random::MT qw[ rand ]; sub shuffle { return unless defined wantarray; my $ref = @_ == 1 ? $_[ 0 ] : [ @_ ]; for( 0 .. $#$ref ) { my $p = $_ + rand( @{ $ref } - $_ ); @{ $ref }[ $_, $p ] = @{ $ref }[ $p, $_ ]; } return wantarray ? @{ $ref } : $ref; } my $string = 'abc' x 1e6; my $len = length $string; my @picks = ( shuffle 0 .. $len-1 )[ 0 .. $len * 0.1 ]; substr $string, $_, 1, '?' for @picks;

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Finding a random position within a long string (Activeperl Build 822)
by johngg (Canon) on Oct 12, 2007 at 22:40 UTC
    Here's another way that builds an array of random positions in which to place a '?' and then makes the substitutions using an array slice. Downside is the spliting and rejoining of the string.

    use strict; use warnings; my $str = q{ABC} x 10_000; my @chars = split m{}, $str; my @randPosns = grep { rand 10 < 1 } 0 .. length($str) - 1; @chars[@randPosns] = (q{?}) x @randPosns; my $newStr = join q{}, @chars; print $newStr;

    Cheers,

    JohnGG

Re: Finding a random position within a long string (Activeperl Build 822)
by snopal (Pilgrim) on Oct 12, 2007 at 21:26 UTC

    It seems to me that you can not have "random" and "equally distributed" without knowing the sampling size after the process. Random is one thing, handling a equal distribution randomly is a whole 'nother ball of wax.

      snopalyou can not have "random" and "equally distributed" without knowing the sampling size after the process

      As I see this, if you take samples of length M = k * W (from
      the whole string of length W), then the mean ratio of the property
      in question ('?' to non '?') in each M-sample should approach the
      expected ratio R (e.g. 0.1), if the number of samples taken
      is large enough. (The variance of the property in equally sized
      M-samples shouldn't depend on their "position" within the whole ensemble.)

      But maybe I didn't understand your objection correctly?

      Regards

      mwa

        By your statement, there is no distribution restriction "if the number of samples taken is large enough." By placing all 10% of your '?' at the beginning and doing a tremendous number of samples, your "data full of ?" or "data of no ?" should even out to 10%. You just have to take a large number of samples.

        Since neither the sample size, nor the number of samples is defined, it could just as easily be "one sample of ten percent plus one", the result of which would be that virtually every random solution would fail.

        This looks like homework. The level of sophistication of your request changed dramatically from the original problem statement to the response you gave me.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://644539]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (8)
As of 2024-04-18 10:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found