Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

<=> vs cmp

by Jasper (Chaplain)
on May 27, 2002 at 10:28 UTC ( [id://169539]=perlquestion: print w/replies, xml ) Need Help??

Jasper has asked for the wisdom of the Perl Monks concerning the following question:

I'm sorting some stuff, and sometimes I want to order by a name, and sometimes I want to order by a number. Obviously I can't do it all using <=>, but I can sort numbers using cmp. I was wondering what sort of a performance hit I would get using cmp on integers, rather than using some sort of coderef solution, but I find the results of a benchmark rather strange..
Benchmark: running comp, space, each for at least 3 CPU seconds... comp: 4 wallclock secs ( 3.03 usr + 0.01 sys = 3.04 CPU) @ 89 +8773.68/s (n=2732272) space: 5 wallclock secs ( 3.12 usr + -0.02 sys = 3.10 CPU) @ 79 +5990.97/s (n=2467572)
What's up with that?

Edit kudra, 2002-05-27 Replaced literal gt and lt with html

Replies are listed 'Best First'.
Re: = vs cmp
by ariels (Curate) on May 27, 2002 at 11:04 UTC

    You can sort with cmp instead of <=>; the question is if you'll like the results.

    DB<1> x sort { $a cmp $b } 1..10 0 1 1 10 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 DB<2> q
      whoops! I obviously am as stupid as I look. Didn't really notice this because all my ints are pretty much the same length, and the sorting by int gives fairly randomised results, anyway. Actually, I could zero-fill my script generated numbers, and store them that way.. That way I'd get the speed of sorting using cmp, and slightly simpler code.

      Hmmm, goes off and thinks.
Re: = vs cmp
by mrbbking (Hermit) on May 27, 2002 at 12:20 UTC
    You can sort names and numbers with sort, and then have pretty clean code:
    my $code = 'untested; no Perl on this machine...'; my @numbers = qw (5 6 7 2 4 5); my @names = qw(Larry Randal Tom Jon); @numbers = sort @numbers; @names = sort @names; foreach(@numbers){ print "$_\n"; } foreach(@names){ print "$_\n"; }
    ...I admit that I may be missing the point, since sort uses <=> internally, by default...
Re: = vs cmp
by Anonymous Monk on May 27, 2002 at 12:22 UTC
    Well, who can tell? You only gave us the results of the benchmark, not what you actually tested - neither the code, nor the input.

    Furthermore, your results don't give a huge difference.

    There isn't much that can be said based on your posting.

      Well, who can tell? You only gave us the results of the benchmark, not what you actually tested - neither the code, nor the input.

      yeah, sorry.. Now deleted the code, so I can't post exactly what I benchmarked, but it was just an array of 200 or so 4 digit integers. The code was as basic as I could make it.

      something like:
      use Benchmark; my @array = (6252,8278, ...array of 200 4 digit integers...; timethese(0, { sorty => \&sorty, space => \&spaceship, comp => \&comp }); sub sorty { return sort @array } sub comp { return sort {$a cmp $b} @array } sub spaceship { return sort {$a <=> $b} @array } ## gives: Benchmark: running comp, sorty, space, each for at least 3 CPU seconds +... comp: 2 wallclock secs ( 3.05 usr + 0.02 sys = 3.07CPU) @ 617 +076.87/s (n=1894426) sorty: 1 wallclock secs ( 3.06 usr + -0.00 sys = 3.06CPU) @ 650 +267.65/s (n=1989819) space: 3 wallclock secs ( 3.38 usr + 0.02 sys = 3.40CPU) @ 590 +187.65/s (n=2006638)
      I was just surprised that sorting using cmp was not much slower than <=>; for .. I realise now that I just hadn't thought about it too much (heavy weekend), and if I had, I probably wouldn't have had a preconcieved idea about the relative speeds..(fiddling with the sizes of the integers, making them 1-16 digits long doesn't have much effect on the relative (or actual) speed of the sorts, either, btw).

      I shall just use sort on its lonesome, I think.
        Your results can vary greatly depending on your data. Check out this code that gives you a pretty evenly distributed sample.
        use Benchmark; # Get 200 random 4-digit integers my @array = (grep {/\d{4}/} map { 10000 * (sprintf "%0.4f", rand) } (1 +..400))[1..200]; print "N = " . @array . "\n\n"; # Be sure we have the right sample siz +e timethese(0, { sorty => \&sorty, space => \&spaceship, comp => \&comp }); sub sorty { sort @array } sub comp { sort {$a cmp $b} @array } sub spaceship { sort {$a <=> $b} @array }
        PRINTS:
        N = 200 Benchmark: running comp, sorty, space, each for at least 3 CPU seconds +... comp: 3 wallclock secs ( 3.15 usr + 0.01 sys = 3.16 CPU) @ 1064618. +52/s (n=3368453) sorty: 3 wallclock secs ( 3.11 usr + 0.00 sys = 3.11 CPU) @ 1096416 +.69/s (n=3415338) space: 3 wallclock secs ( 3.11 usr + 0.00 sys = 3.11 CPU) @ 1064940 +.23/s (n=3314094)
        Looks like a pretty even race to me!
        Well, you are not measuring anything but how fast Perl can call a subroutine. The very high iterations/per second for sorting a 200 element array should have tipped you off.

        One should realize that Benchmark calls your function in void context. And sorting in void context is optimized to not do the sorting. So, you haven't benchmark anything. Benchmarking is an art, and you shouldn't test something with one size of data and draw conclusions.

        Here's a better benchmark:

        use strict; use Benchmark; for (my $size = 10; $size <= 100_000; $size *= 10) { @main::array = map {int rand $size} 1 .. $size; timethese -3 => { "[$size] <=>" => 'my @array2 = sort {$a <=> $b} @main::array +', "[$size] cmp" => 'my @array2 = sort {$a cmp $b} @main::array +', } }
        Name "main::array" used only once: possible typo at ./sort_bench.pl line 9.
        Benchmark: running 10 <=>, 10 cmp, each for at least 3 CPU seconds...
          10 <=>:  4 wallclock secs ( 3.13 usr +  0.01 sys =  3.14 CPU) @ 45414.01/s (n=142600)
          10 cmp:  3 wallclock secs ( 3.01 usr +  0.00 sys =  3.01 CPU) @ 44175.08/s (n=132967)
        Benchmark: running 100 <=>, 100 cmp, each for at least 3 CPU seconds...
         100 <=>:  3 wallclock secs ( 3.23 usr +  0.01 sys =  3.24 CPU) @ 4453.70/s (n=14430)
         100 cmp:  3 wallclock secs ( 3.23 usr +  0.00 sys =  3.23 CPU) @ 4095.67/s (n=13229)
        Benchmark: running 1000 <=>, 1000 cmp, each for at least 3 CPU seconds...
        1000 <=>:  3 wallclock secs ( 3.22 usr +  0.01 sys =  3.23 CPU) @ 324.77/s (n=1049)
        1000 cmp:  3 wallclock secs ( 3.26 usr +  0.00 sys =  3.26 CPU) @ 292.64/s (n=954)
        Benchmark: running 10000 <=>, 10000 cmp, each for at least 3 CPU seconds...
        10000 <=>:  3 wallclock secs ( 3.09 usr +  0.01 sys =  3.10 CPU) @ 17.74/s (n=55)
        10000 cmp:  3 wallclock secs ( 3.08 usr +  0.00 sys =  3.08 CPU) @ 13.96/s (n=43)
        Benchmark: running 100000 <=>, 100000 cmp, each for at least 3 CPU seconds...
        100000 <=>:  4 wallclock secs ( 3.42 usr +  0.01 sys =  3.43 CPU) @  1.17/s (n=4)
        100000 cmp:  3 wallclock secs ( 3.30 usr +  0.01 sys =  3.31 CPU) @  0.91/s (n=3)
                    (warning: too few iterations for a reliable count)
        

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://169539]
Approved by virtualsue
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2024-03-28 14:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found