Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^2: $str to %hash to @ary

by fizbin (Chaplain)
on Jul 24, 2004 at 14:48 UTC ( [id://377130]=note: print w/replies, xml ) Need Help??


in reply to Re: $str to %hash to @ary
in thread $str to %hash to @ary

After finding that oops above in my code, I thought I better put correctness tests in and publish the code I was using so that other people could tell me I was nuts if necessary.

So here again is the output of the benchmark code:

==== Correctness tests ==== Probabilities are: {6 => 27.000, 19 => 23.000, 21 => 33.000, 43 => 17. +000} origCode yielded {6 => 26.955, 19 => 22.825, 21 => 33.142, 43 => 17.07 +8} duff yielded {6 => 27.111, 19 => 22.956, 21 => 33.008, 43 => 16.925} fizbin yielded {6 => 27.117, 19 => 23.057, 21 => 32.862, 43 => 16.964} L~R yielded {6 => 26.929, 19 => 22.946, 21 => 33.151, 43 => 16.974} QM yielded {6 => 59.902, 19 => 0.000, 21 => 17.062, 43 => 23.036} QM failed ccn yielded {6 => 26.991, 19 => 22.882, 21 => 33.064, 43 => 17.063} ==== Speed tests ==== Rate origCode duff QM L~R ccn fizbin origCode 4101/s -- -34% -78% -83% -87% -89% duff 6191/s 51% -- -66% -75% -81% -83% QM 18348/s 347% 196% -- -26% -43% -50% L~R 24743/s 503% 300% 35% -- -24% -32% ccn 32411/s 690% 424% 77% 31% -- -11% fizbin 36567/s 792% 491% 99% 48% 13% --
And here's the code I used to make that determination. The formatting is from being run through perltidy, since it was even uglier before.
#!perl use strict; my $str = "17:43:33:21:23:19:27:6"; my $codehash = { origCode => sub { my @ary; my %hash = split /:/, $str; foreach my $k ( keys %hash ) { push( @ary, map { $hash{$k} } ( 1 .. $k ) ); } my $adId = $ary[ int( rand(100) ) ]; }, ccn => sub { my %hash = split /:/, $str; my $adno; my $rand = rand 100; my $sum = 0; for ( keys %hash ) { # there is no need of sorted keys $adno = $hash{$_}; last if ( $sum += $_ ) > $rand; } $adno; }, QM => sub { my %hash = reverse split /:/, $str; my $count; my %ad_lookup; foreach my $k ( keys %hash ) { $count += $hash{$k}; $ad_lookup{$count} = $k; } my $rand = rand(100); my $adid; foreach ( sort { $a <=> $b } keys %ad_lookup ) { $adid = $ad_lookup{$_} unless defined($adid); if ( $_ <= $rand ) { $adid = $ad_lookup{$_}; } else { last; } } $adid; }, "L~R" => sub { my $lookup; my %hash = reverse split /:/, $str; while ( my ( $key, $val ) = each %hash ) { $lookup .= pack( "C*", ($key) x $val ); } my $addid = unpack( "C", substr( $lookup, rand 100, 1 ) ); }, duff => sub { my @ary; my @a = split /:/, $str; @a % 2 && die; # not an even number of items while ( my ( $p, $ad ) = splice @a, 0, 2 ) { push @ary, ($ad) x $p; } my $adId = $ary[ int( rand(100) ) ]; }, fizbin => sub { my @a = split /:/, $str; @a % 2 && die; # not an even number of items my $r = int( rand(100) ); my ( $adId, $p ); while ( ( $p, $adId ) = splice @a, 0, 2 ) { if ( $r < $p ) { last; } $r -= $p; } $adId; } }; sub phash (%) { my %h = @_; return "{" . join( ", ", map { sprintf( '%s => %3.3f', $_, $h{$_} ); } sort { $a <=> $b } keys(%h) ) . "}"; } print "==== Correctness tests ==== \n\n"; my %strhash = reverse split( /:/, $str ); print "Probabilities are: ", phash(%strhash), "\n"; foreach my $subname ( keys %$codehash ) { my %resultshash = map { $_ => 0 } keys %strhash; do { $resultshash{ $codehash->{$subname}->() } += 0.001; } for ( 1 .. 100000 ); print "$subname yielded ", phash(%resultshash), "\n"; do { print "$subname failed\n" and last if ( abs( $resultshash{$_} - $strhash{$_} ) > 0.5 ); } for keys(%strhash); } print "\n==== Speed tests ====\n\n"; use Benchmark qw(cmpthese); cmpthese( -5, $codehash );

Replies are listed 'Best First'.
Re^3: $str to %hash to @ary
by Limbic~Region (Chancellor) on Jul 24, 2004 at 15:31 UTC
    fizbin,
    You have missed that I reversed the ordering of the string. As is, there is no way to have two adds with the same percentage nor is there a straight forward way of determining the percentage of any given add. I was trying to be creative, provide a significant speed increase, and maintain manageability.

    Additionally, your code is unfair. In your benchmark, all solutions are being timed for creating a structure to return an add and then returning that add each time. The trouble is that, in my method (and others), we only have to invest that time once because we do not destroy it but your solution does. You would need to alter the bench to be fair.

    Cheers - L~R

      I agree that the OP's data structure as described had some deficiencies (such as only one ad. per invidual probability value). Also, I think that the "reverse" I put in your code is evidence that I saw the string reversal. However, your main point is:
      Additionally, your code is unfair. In your benchmark, all solutions are being timed for creating a structure to return an add and then returning that add each time. The trouble is that, in my method (and others), we only have to invest that time once because we do not destroy it but your solution does. You would need to alter the bench to be fair.
      Well, if so I should then count time to serialize and deserialize that structure, since as far as I can tell the OP was looking for something that would be useful in a cgi script. (otherwise, his original complaints about performance make little sense) But for the sake of argument, let's refactor the benchmark so that each method has a setup step and a run step, and see what we get. I'll call the way I tested things before "cgi style" and the build-structure-once, extract-many method "mod_perl style". (I've eliminated QM's code until he can provide a corrected version, and added ccn's second version. I've also cut the correctness tests from this output for clarity)
      ==== Speed tests ==== As cgi: (from scratch each time): Rate origCode duff L~R ccn fizbin ccn_fast +ccn_faster origCode 3799/s -- -33% -83% -87% -89% -96% + -97% duff 5671/s 49% -- -75% -80% -84% -94% + -95% L~R 23009/s 506% 306% -- -19% -36% -75% + -79% ccn 28461/s 649% 402% 24% -- -21% -69% + -75% fizbin 36068/s 849% 536% 57% 27% -- -61% + -68% ccn_fast 92559/s 2337% 1532% 302% 225% 157% -- + -17% ccn_faster 111920/s 2846% 1874% 386% 293% 210% 21% + -- As mod_perl: (NOT from scratch each time): Rate fizbin ccn_fast ccn ccn_faster L~R origCo +de duff fizbin 36073/s -- -61% -62% -68% -92% -9 +2% -92% ccn_fast 92123/s 155% -- -2% -17% -80% -8 +0% -80% ccn 94311/s 161% 2% -- -15% -79% -8 +0% -80% ccn_faster 111200/s 208% 21% 18% -- -75% -7 +6% -76% L~R 450572/s 1149% 389% 378% 305% -- - +3% -3% origCode 462599/s 1182% 402% 391% 316% 3% +-- -1% duff 466249/s 1193% 406% 394% 319% 3% +1% --
      ccn_fast is the version ccn posted in response to my benchmark the first time. ccn_faster is a minor tweak I made to ccn_fast to have it not use an additional variable on repeated invocations.

      A few curious things to notice here - first off, using a mod_perl-style test, the original code is almost as fast as possible. (or at least, "almost as fast as what anyone else has put forward") So my first suggestion to the OP would be: if performance is your problem, find a way to switch to mod_perl and do the building of the array once. (but consider the disadvantages others have mentioned of your code only allowing one ad. per percentage value) Secondly, although the code for the methods duff and origCode is identical after the initial building of the data structure, duff was consistently (I ran this test program a few times) about 1% faster. I have no explanation for this, though I'd like one from someone with intimate knowledge of perl internals. (perl version 5.8.2 as shipped with cygwin)

      And yes, my code gets spanked in the mod_perl scenario, but that's expected. What I hadn't expected is just how blinding fast ccn_fast wound up being, enough so that it's still almost competitive under a mod_perl scenario.

      And now, the improved benchmark code:

      Finally, I want to note that I trust the "cgi style" tests of this code less than the previous benchmark code. In this version, code that has two halves (setup and eval) gets penalized extra in the cgi-style tests because of the additional indirection and sub invocation which didn't happen in the previous tests, and wouldn't happen in a real cgi script. I'm not quite sure what to do here; perhaps I could try to penalize the one-piece scripts in a similar fashion to compensate. (Or, for cgi-style tests, use the benchmark I posted earlier)
      -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/
        fizbin,
        I knew I was trading speed in setup time for speed in lookup time. If you want to see me on top, change the following:
        "L~R" => sub { my @foo; my $str = "17:43:33:21:23:19:27:6"; while (my $p = $str =~ /([^:]+)/g && $1 and my $add_id = $str +=~ /([^:]+)/g && $1) { push @foo, ($add_id) x $p; } \@foo; }, "L~R" => sub { my $addid = $_[0]->[ rand 100 ]; }, As mod_perl: (NOT from scratch each time): Rate fizbin ccn_fast ccn ccn_faster origCode du +ff L~R fizbin 35680/s -- -52% -53% -60% -91% -9 +1% -92% ccn_fast 74642/s 109% -- -2% -16% -80% -8 +1% -83% ccn 76321/s 114% 2% -- -14% -80% -8 +1% -82% ccn_faster 88368/s 148% 18% 16% -- -76% -7 +8% -80% origCode 375784/s 953% 403% 392% 325% -- - +6% -13% duff 398928/s 1018% 434% 423% 351% 6% +-- -8% L~R 432419/s 1112% 479% 467% 389% 15% +8% --
        I did notice that not on every pass all methods were "passing". If someone wanted to make it even faster they would randomize the array as part of set up time and then just loop through the array.

        Cheers - L~R

Re^3: $str to %hash to @ary
by ccn (Vicar) on Jul 24, 2004 at 16:20 UTC

    nice test
    you may try also this one:

    ccn_fast => sub { my $rand = rand 100; my $mystr = $str; # this line is needed for repetitive test on +ly 1 while $mystr =~ /([^:]+):([^:]+):?/g and (($rand -= $1) > 0) +; $2; },
      Nice (see my reply to Lymbic~Region); that's fast. In multiple trials, it gets even faster (by ~ 20%) if you don't use a separate variable but instead reset pos($str)
      -- @/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://377130]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2024-03-28 23:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found