Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: Re: Re: string manipulation

by Xxaxx (Monk)
on Mar 30, 2001 at 01:15 UTC ( [id://68206] : note . print w/replies, xml ) Need Help??

in reply to Re: Re: string manipulation
in thread string manipulation

I recommend running the benchmark again using the actual functions.

In your referenced benchmark you're comparing:

$data =~ tr/a-z/A-Z/;
$data =~ s/(A-Za-z+)/uc($1)/ge;

The uc($1) is a little different than:
$outstring =~ s/-/_/g;

When I ran the actual benchmark I got the following results:

Run 1: (n=5000000)
Method One TR: 3 wallclock secs ( 3.00 usr + 0.00 sys = 3.00 CPU) @ 166666.67
Method Two S: 3 wallclock secs ( 3.05 usr + 0.00 sys = 3.05 CPU) @ 163934.43

Run 2: (n=5000000)
Method One TR: 2 wallclock secs ( 2.96 usr + 0.00 sys = 2.96 CPU) @ 168918.92
Method Two S: 3 wallclock secs ( 3.08 usr + 0.00 sys = 3.08 CPU) @ 162337.66

Run 3: (n=5000000)
Method One TR: 4 wallclock secs ( 2.97 usr + 0.00 sys = 2.97 CPU) @ 168350.17
Method Two S: 3 wallclock secs ( 3.08 usr + 0.00 sys = 3.08 CPU) @ 162337.66

Seems like they are pretty much equal. But, alas, I'm real new to this benchmark stuff and I could have some weird caching issue.

Even so without the eval of uc($1) this is certainly no 17 to 1 ratio as the referenced benchmark shows.

Hope this helps
p.s. Thanks to Desdinova for introducing me to the world of benchmarks.

Replies are listed 'Best First'.
Re: Re: Re: Re: string manipulation
by extremely (Priest) on Mar 30, 2001 at 03:31 UTC
    Post your code for this Benchmark and I'll show you where it went wrong. I'd bet on the variable you tested against not being in scope inside the benchmark sub/evals.

    My results, Linux on an IBM Netfinity (Intel)

    Benchmark: running regexp, transl, each for at least 10 CPU seconds... regexp: 10 wallclock secs (10.59 usr + 0.00 sys = 10.59 CPU) @ 37 +274.69/s (n=394739) transl: 13 wallclock secs (10.46 usr + 0.00 sys = 10.46 CPU) @ 31 +3981.07/s (n=3284242) Rate regexp transl regexp 37275/s -- -88% transl 313981/s 742% --

    That is a significant differential there for this simple task. A full regexp engine is a big thing to throw at a lightweight string scan. My benchmark code follows:

    use strict; use Benchmark qw(cmpthese); use vars qw( $x ); $x = 'This-is-a-test-string-I-just-typed-in-for-fun'; cmpthese (-10, { 'transl' => '$x =~ tr/-/_/; $x =~ tr/_/-/;', 'regexp' => '$x =~ s/-/_/g; $x =~ s/_/-/g;', } );

    Oh yeah, I sure am happy Benchmark exists too. =)

    Doh! Update: that assignment was:
    $x = 'This_is_a_test_string_I_just_typed_in_for_fun';
    It wasn't result impacting, just stupid since it no-ops half my test. Interestingly, if I change the string to one with spaces rather than the '-' or '_' I wind up with regexp being 50-60% faster at doing nothing but scanning with no changes...

    $you = new YOU;
    honk() if $you->love(perl)

      Hey Extremely, Thanks for taking a look at the code and letting me know where the benchmark may have messed up:

      #!/usr/local/bin/perl -w use strict; use Benchmark; my $count =500000; ## Method number one sub One { my $data = 'for bar baz'; my($outstring); ($outstring = $data) =~ tr/-/_/; } ## Method number two sub Two { my $data = 'for bar baz'; my($outstring) = $data; $outstring =~ s/-/_/g; } ## We'll test each one, with simple labels timethese ( $count, {'Method One TR' => '&One', 'Method Two S' => '&Two', } ); exit;

        Your test strings don't have any '-' hyphens in them. =) And you change assignment forms between the two subs which honestly shouldn't have much effect but is still questionable practice when benchmarking since you want the code identical in nature except for the key point you are testing...

        Also, since the $data is set in the sub it will be reset every pass. As well, you can just set up the benchmark like this and avoid having perl eval a sub call in a string:

        timethese ( $count, { 'Method One S' => sub { my $data = 'foo-bar-baz'; $data =~ s/-/_/g; } 'Method Two TR' => sub { my $data = 'foo-bar-baz'; $data =~ tr/-/_/; } } );

        Not a big deal but it may save you some typing in the future. Your way has the benefit of being easy to run the subs once and test their output, tho...

        BTW, I moronically goofed my test string too... =)

        $you = new YOU;
        honk() if $you->love(perl)

Re: Re: Re: Re: string manipulation
by Desdinova (Friar) on Mar 30, 2001 at 03:57 UTC
    You have a valid point in that node i was kind of comparing apples to oranges (Of course that node was about uppercasing input which is a bit different). I must have been having brain dead kind of day. As for your number I changed my benchmark code to this for this specific case to this:
    #!/usr/local/bin/perl -w use strict; use Benchmark; my $count = 900000; ## Method number two sub One { my $data='for-bar-baz'; $data =~tr/-/_/; } ## Method number Two sub Two { my $data='for-bar-baz'; $data =~s/-/_/g; } ## We'll test each one, with simple labels timethese ( $count, {'Method One TR' => '&One', 'Method Two s'=> '&Two' } ); exit;
    Which results in the following
    Benchmark: timing 500000 iterations of Method One TR, Method Two s... Method One TR: 2 wallclock secs ( 1.87 usr + 0.00 sys = 1.87 CPU) @ + 267379.68/s (n=500000) Method Two s: 5 wallclock secs ( 4.84 usr + 0.00 sys = 4.84 CPU) @ +103305.79/s (n=500000)
    FYI- I got these numbers using Perl 5.6.0 on Win32 You are right that this is not as big of difference as 17:1 (Which when I first ran seemed odd to me...) But is still a decent gap.
    I got the suggestion about not using S when TR will do from Effective Perl Programming (co-authored by our own merlyn). It is a great book with lots of info tweaking your code.

    off to update that other node now...
    PS- your benchmark doesnt do anything in either case the line:my $data='for bar baz'; Should be changed to have the char being looked for. Like So: my $data='for-bar-baz';