Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.

Benchmarking the basic operations

by Falkkin (Chaplain)
on Jan 24, 2001 at 22:16 UTC ( [id://54064] : perlmeditation . print w/replies, xml ) Need Help??

As a pseudo-followup to the discussion on Testing a number for oddness, I decided I'd subject my computer to spending the better part of yesterday running benchmarks on the basic perl operators. Just thought I'd share the results with the monastery.

To begin with, here's the code I used:

use Benchmark; sub p_add { for my $i (1..100) { for (1..100) { $_ + $i } } } sub p_sub { for my $i (1..100) { for (1..100) { $_ - $i } } } sub p_mul { for my $i (1..100) { for (1..100) { $_ * $i } } } sub p_div { for my $i (1..100) { for (1..100) { $_ / $i } } } sub p_pwr { for my $i (1..100) { for (1..100) { $_ ** $i } } } sub p_mod { for my $i (1..100) { for (1..100) { $_ % $i } } } sub p_and { for my $i (1..100) { for (1..100) { $_ & $i } } } sub p_or { for my $i (1..100) { for (1..100) { $_ | $i } } } sub p_xor { for my $i (1..100) { for (1..100) { $_ ^ $i } } } sub p_not { for my $i (1..100) { for (1..100) { ~$_ } } } sub p_shl { for my $i (1..100) { for (1..100) { $_ << $i } } } sub p_shr { for my $i (1..100) { for (1..100) { $_ >> $i } } } sub p_gt { for my $i (1..100) { for (1..100) { $_ > $i } } } sub p_gte { for my $i (1..100) { for (1..100) { $_ >= $i } } } sub p_lt { for my $i (1..100) { for (1..100) { $_ < $i } } } sub p_lte { for my $i (1..100) { for (1..100) { $_ <= $i } } } sub p_eq { for my $i (1..100) { for (1..100) { $_ == $i } } } sub p_ne { for my $i (1..100) { for (1..100) { $_ != $i } } } timethese(100000, { add => \&p_add, sub => \&p_sub, mul => \&p_mul, div => \&p_div, pwr => \&p_pwr, mod => \&p_mod, and => \&p_and, or => \&p_or, xor => \&p_xor, not => \&p_not, shl => \&p_shl, shr => \&p_shr, gt => \&p_gt, gte => \&p_gte, lt => \&p_lt, lte => \&p_lte, eq => \&p_eq, ne => \&p_ne });

Doing the math shows that, for each operation, I've run a billion (10^9) trials. That's 100 runs through the inner loop, times 100 runs through the outer loop, times 100000 calls to the appropriate subroutine, per operation. Each operation took nearly an hour to run. I've tried to be as "fair" as possible with this benchmark; if anyone has suggestions on a Better Way To Do It, I'd be willing to let a new benchmark run monopolize my box for another day or two. ;)

I've sorted the various operators by type (comparative, logical, mathematical) and sorted each of these categories from "quickest" to "slowest", based on total CPU time.

Disclaimer: I hardly know anything about perl's internals, so you should consider my thoughts below as mere conjectures, to be corrected by a more knowledgeable monk. Chances are that these numbers are platform-dependent as well; your mileage may vary. These figures were tested on a 233 MHz Pentium I/MMX machine, running perl 5.6 on linux 2.2.17.

Comparative operators

lt: 2392 wallclock secs (2373.87 usr + 0.12 sys = 2373.99 CPU) @ 42. +12/s (n=100000) gte: 2392 wallclock secs (2374.10 usr + 0.18 sys = 2374.28 CPU) @ 42. +12/s (n=100000) eq: 2401 wallclock secs (2381.67 usr + 0.33 sys = 2382.00 CPU) @ 41. +98/s (n=100000) gt: 2405 wallclock secs (2387.47 usr + 0.10 sys = 2387.57 CPU) @ 41. +88/s (n=100000) lte: 2417 wallclock secs (2398.93 usr + 0.08 sys = 2399.01 CPU) @ 41. +68/s (n=100000) ne: 2470 wallclock secs (2411.78 usr + 1.70 sys = 2413.48 CPU) @ 41. +43/s (n=100000)
There is only about a 1.7% difference between the fastest and the slowest comparison operators. The only operation which appears to be noticeably slower than the others here is !=. I thought it was odd that the != operation took 5 times as much system time as the other operations... any monk out there have any idea why this might be the case? Since >= and < took almost exactly the same amount of time, I conjectured that $foo >= $bar might get translated to $bar < $foo by the compiler, but this appears to not be the case:
perl -MO=Deparse,-p -we 'print 1 if $foo >= $bar' (($foo >= $bar) and print(1)); -e syntax OK
Nor is it true the other way around:
perl -MO=Deparse,-p -we 'print 1 if $foo < $bar' (($foo < $bar) and print(1)); -e syntax OK

Logical operators

not: 2231 wallclock secs (2225.95 usr + 0.45 sys = 2226.40 CPU) @ 44. +92/s (n=100000) shr: 2653 wallclock secs (2478.63 usr + 3.47 sys = 2482.10 CPU) @ 40. +29/s (n=100000) and: 2513 wallclock secs (2493.72 usr + 0.31 sys = 2494.03 CPU) @ 40. +10/s (n=100000) or: 2549 wallclock secs (2534.75 usr + 0.39 sys = 2535.14 CPU) @ 39. +45/s (n=100000) shl: 2921 wallclock secs (2507.25 usr + 6.75 sys = 2514.00 CPU) @ 39. +78/s (n=100000) xor: 2797 wallclock secs (2560.60 usr + 2.05 sys = 2562.65 CPU) @ 39. +02/s (n=100000)
These tests show that bitwise negation (the not benchmark) is 11.5% faster than any of the other logical operators. This makes sense since bitwise negation is a unary operator (it works with only one value), as opposed to the other logical operators, which all take two operands. The differences between the binary operators are not as striking; there's only a 3.2% difference between the fastest (right shift) and slowest (exclusive or). Both left-shift and right-shift use more system time than the rest; again, I have no idea why this is the case.

Mathematical operators

add: 2841 wallclock secs (2682.82 usr + 0.67 sys = 2683.49 CPU) @ 37. +26/s (n=100000) mul: 3048 wallclock secs (2696.55 usr + 2.84 sys = 2699.39 CPU) @ 37. +05/s (n=100000) sub: 2743 wallclock secs (2734.40 usr + 0.55 sys = 2734.95 CPU) @ 36. +56/s (n=100000) mod: 2878 wallclock secs (2755.51 usr + 0.88 sys = 2756.39 CPU) @ 36. +28/s (n=100000) div: 3018 wallclock secs (2989.89 usr + 0.65 sys = 2990.54 CPU) @ 33. +44/s (n=100000) pwr: 3883 wallclock secs (3549.47 usr + 5.42 sys = 3554.89 CPU) @ 28. +13/s (n=100000)
The exponentiation operator (**) took a good 18.5% longer than any other mathematical operator. This makes intuitive sense, merely because it takes more work to compute a power than to do any of the other operators. For the sake of discussion, I'll consider ** an outlier for now, and focus on analyzing the other operators in this category.

The addition operator is, unsurprisingly, the fastest of the bunch. I found it mildly surprising that multiplication is only 0.6% slower than addition. A typical assumption I've made in the past is that a multiplication "should" take longer than a comparable addition, but this appears to not be the case.

Subtraction runs about 1.9% slower than addition. I assume this is because computing a subtraction involves finding the negation of the second operand and adding the result with the first operand; this probably happens in-hardware, instead of in-perl, but it's still a nice thing to know for speed-critical optimizations.

Modular division (%) does appear to be slower than logical and (&), which contradicts the results I got in Testing a number for oddness. I think the benchmarks presented here are more accurate than the original ones, because I let them run longer and because I think they are crafted better (there are now fewer subroutine calls per amount of data, and I don't use the time-consuming rand() function).

The slowest remaining operator is normal division. My original guess was that division might be slower because, for a lot of cases (such as 42/5) the results are non-integer, so it has to use (mildly slower) floating-point hardware to find the result. On the other hand, I realized that I didn't use integer in my benchmark, so all of my mathematical operations were hypothetically done on floating-point hardware anyhow. Is there any monk out there who has a better reason for why division might take longer than the other mathematical operators?

I thought that running this benchmark was interesting, and I hope it might be useful to anyone who really needs to optimize their code. I know that a typical response to anyone who says "my foo code is too slow" is something along the lines of "then rewrite foo in C"... which is probably true, if speed is critical (and if you actually know how to program well in C.) If you are looking to optimize perl code, however, I will admit that these results are probably the last things you should try to use; none of these operators is likely anywhere near as slow as your average regex or simple function call. At any rate, happy perling!

Replies are listed 'Best First'.
Re: Benchmarking the basic operations
by chipmunk (Parson) on Jan 24, 2001 at 23:43 UTC
    It would be interesting to see results of this benchmark from other platforms; these are operations where the efficiency definitely depends on the underlying hardware.

    Since >= and < took almost exactly the same amount of time, I conjectured that $foo >= $bar might get translated to $bar < $foo by the compiler, but this appears to not be the case:

    But then, $foo >= $bar is actually equivalent to $bar <= $foo, so you don't gain anything by doing the conversion. :)

      <sarcasm>Yes, then we could make our programs run faster with:

      if( $^O =~ /^MSWin/ ) { $comp= $a lt $b; } else { $comp= $b gt $a; }

      Please run the benchmarks 5 times with the order in which the different tests are run changed each time (prepend different letters to each case as runs them in order sorted by name). Then you can try to interpet the numbers. It wouldn't surprise me if the optimizer removed those operations anyway and you are just timing a bunch of loops.

      It is quite normal for the exact same operation to benchmark 5% or even more different each time you try. This is especially true of tiny things like what you are trying to benchmark.

      It just so happens that these are also the types of things that you don't want to optimize.

      Comparing the speed of lt, gt, ge, le, ne, and eq is just, well, silly. The difference comes down to a fraction of a machine language instruction. The time it takes to dispatch a single Perl opcode dwarfs that like a flea on an aircraft carrier.

      Some of the other comparisons might be interesting if the benchmarking code and results were shown to be actually doing a comparison. But I want to stress that this is not the type of stuff you should even think about when trying to make your code run faster.

      Okay, I feel slightly better. Now back to our regularly schedule trolling^Wdownvoting^Wscholarly discussions. (:

              - tye (but my friends call me "Tye")
        Yes, then we could make our programs run faster with:
        if( $^O =~ /^MSWin/ ) { $comp= $b gt $a; } else { $comp= $a lt $b; }
        Well, you must not have done the actual benchmark. I did, and I found that on MSWin32, it's gt that is faster, not lt. So the correct way to optimize this comparison is:
        if ($^O =~ /^MSWin/) { $comp= $a gt $b; } else { $comp= $b lt $a; }


        You said,"

          Comparing the speed of lt, gt, ge, le, ne, and eq is just, well, silly. The difference comes down to a fraction of a machine language instruction. The time it takes to dispatch a single Perl opcode dwarfs that like a flea on an aircraft carrier. "

        How do you define dispatch and opcode? Apparently I'm not even literate in what constructs to focus on when optimizing, or how to isolate dispatch in a benchmark.