http://qs321.pair.com?node_id=1038944


in reply to Re: Fast Replacement (0.000025s)
in thread Fast Replacement

While normally I greatly respect your insight, appreciate your input, and value your code, in this case I feel I have to point out that sathishselvam doesn't seem to want to replace any "!" occuring in the first 50k bytes of the input string, but rather he wants to replace the first 50k occurances of "!". davido seems to agree with me on this one.

Replies are listed 'Best First'.
Re^3: Fast Replacement (0.01 seconds)
by BrowserUk (Patriarch) on Jun 14, 2013 at 11:50 UTC

    Looking again I see you're right.

    But still, rather than invoking the regex engine 50,000 times, better to search for the position of the 50,000th ! and then replace in one pass.

    #! perl -slw use strict; use Time::HiRes qw[ time ]; my $s = '1234!' x 55e3; my $start = time; my( $p, $c ) = ( 0, 50e3 ); 1 while --$c and $p = 1+ index $s, '!', $p; substr( $s, 0, $p ) =~ tr[!][\n]; printf "Took %f seconds\n", time() - $start; __END__ C:\test>junk71;; Took 0.011771 seconds C:\test>junk71;; Took 0.009690 seconds

    That could probably be sped up with a binary chop for the position, but it hardly seems worth it.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      For what it's worth, depending on how I measure, this is at least ten times faster than my solution. In real life it would be quite a bit more than 10x the speed of my solution -- I just used Benchmark to test, and that required making a copy of the input string on each test iteration so as not to mess with the original. Since the OP wouldn't be making copies (hopefully), that could be factored out, and would make the difference between our two algorithms all the more significant.


      Dave

        making a copy of the input string on each test iteration so as not to mess with the original.

        Indeed, it is a pig to benchmark. Here's my attempt.

        What I did was have the first iteration do tr[!][\n] and the second tr[\n][!], using a flag to keep track of odd & even. It also shows how the problem some people level at tr -- the need to know the lists at compile time -- can be addressed:

        #! perl -slw use strict; use Benchmark qw[ cmpthese ]; sub makeTR{ eval "sub{ \$_[ 0 ] =~ tr[$_[0]][$_[1]] }"; } our $N //= 10; die "$N must be even and positive" if $N &1 or $N < 2; our $tr1 = makeTR( '!', "\n" ); our $tr2 = makeTR( "\n", '!' ); our $flag = 0; our $s = '1234!' x 55e3; cmpthese $N, { a => q[ if( $flag ) { my( $p, $c ) = ( 0, 50e3 ); 1 while --$c and $p = index $s, "\n", $p; $tr2->( substr $s, 0, $p ); $flag ^= 1; } else { my( $p, $c ) = ( 0, 50e3 ); 1 while --$c and $p = index $s, "!", $p; $tr1->( substr $s, 0, $p ); $flag ^= 1; } ], b => q[ if( $flag ) { $s =~ s/\n(??{ ( $myregexp::count++ < 50000 ) ? '' : '(?!) +' })/!/g; $flag ^= 1; } else { $s =~ s/!(??{ ( $myregexp::count++ < 50000 ) ? '' : '(?!)' + })/\n/g; $flag ^= 1; } ], };

        And the results put tr 5x to 30x times faster, so your benchmark isn't bad at all:

        C:\test>junk71 -N=2 (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter a b a 5.84 -- -85% b 0.899 550% -- C:\test>junk71 -N=4 (warning: too few iterations for a reliable count) s/iter a b a 5.81 -- -92% b 0.492 1081% -- C:\test>junk71 -N=10 s/iter a b a 5.78 -- -95% b 0.273 2013% -- C:\test>junk71 -N=20 s/iter a b a 5.74 -- -97% b 0.176 3167% --

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

      poetry po·e·try /ˈpəʊɪtri/
      noun [mass noun]
          1 while --$c and $p = 1+ index $s, '!', $p;