Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Efficiently sorting array by non-alpha strings

by Abigail-II (Bishop)
on Feb 20, 2004 at 16:31 UTC ( [id://330590]=note: print w/replies, xml ) Need Help??


in reply to Re: Re: Efficiently sorting array by non-alpha strings
in thread Efficiently sorting array by non-alpha strings

Your bucketsort is the elegant solution that I was seeking. Interesting though it's about twice as slow as my ugly version as indicated by this benchmark:
That's because you didn't fix my typos. Here's a different benchmark, with some of the other suggestions (I didn't include any that went beyond sorting on 'fld_type'):
#!/usr/bin/perl use strict; use warnings; use Benchmark qw /timethese cmpthese/; our @advocates = map {{fld_title => 'Rec' . $_, fld_type => [qw /National State Local/] -> [ra +nd 3]}} 1 .. shift || 1000; sub by_locale { my $a_type; my $b_type; $a_type = 1 if ($a->{'fld_type'} =~ /local/i); $a_type = 2 if ($a->{'fld_type'} =~ /state/i); $a_type = 3 if ($a->{'fld_type'} =~ /national/i); $b_type = 1 if ($b->{'fld_type'} =~ /local/i); $b_type = 2 if ($b->{'fld_type'} =~ /state/i); $b_type = 3 if ($b->{'fld_type'} =~ /national/i); $a_type <=> $b_type; } my $order = "\0local\0state\0national\0"; sub by_locale_b { index ($order, "\0" . lc ($a -> {fld_type}) . "\0") <=> index ($order, "\0" . lc ($b -> {fld_type}) . "\0") } our (@knowmad, @abigail, @borisz, @pirate); cmpthese -1 => { knowmad => '@knowmad = sort by_locale @advocates', abigail => 'my %bucket; map {push @{$bucket {$_ -> {fld_type}}} => $_} @advo +cates; @abigail = map {@{$bucket {$_}}} qw /Local State Nat +ional/;', borisz => 'my %h = (National => 3, State => 2, Local => 1); @borisz = sort {$h {$a -> {fld_type}} <=> $h {$b -> {fld_type}}} @advocates;', pirate => '@pirate = map {$_ -> [1]} sort {$a -> [0] <=> $b -> [0]} map {[lc ($_ -> {fld_type}), $_]} @advoca +tes', browseruk => '@browseruk = sort by_locale_b @advocates', }; __END__ Rate knowmad browseruk borisz pirate abigail knowmad 39.4/s -- -48% -70% -83% -92% browseruk 75.5/s 91% -- -42% -67% -85% borisz 130/s 229% 72% -- -43% -74% pirate 229/s 480% 203% 76% -- -53% abigail 491/s 1147% 551% 279% 115% --

Abigail

Replies are listed 'Best First'.
Re: Re: Efficiently sorting array by non-alpha strings
by borisz (Canon) on Feb 20, 2004 at 16:48 UTC
    But the Hash (%h) should only setup once in my solution, at the top of a script or so. That gives:
    Rate knowmad browseruk pirate abigail borisz knowmad 22.1/s -- -49% -80% -93% -95% browseruk 43.4/s 96% -- -61% -86% -89% pirate 111/s 403% 156% -- -64% -73% abigail 308/s 1291% 609% 177% -- -25% borisz 411/s 1758% 847% 270% 34%
    --
    Boris
      Highly suspicious results. You didn't publish source, but I have a hunch your code has %h lexical to the main program. This will make the sort a linear pass (no sorting happens). Try again with our %h.

      Abigail

        You are right, very impressive abigail-ii found my bug without any source. Here the corrected results:
        Rate knowmad browseruk borisz pirate abigail knowmad 22.1/s -- -49% -72% -80% -93% browseruk 43.0/s 94% -- -45% -62% -86% borisz 78.5/s 255% 83% -- -30% -74% pirate 112/s 407% 161% 43% -- -64% abigail 308/s 1291% 616% 292% 174% --
        Boris
Re: Re: Efficiently sorting array by non-alpha strings
by knowmad (Monk) on Feb 20, 2004 at 17:04 UTC

    I did fix a few typos but my benchmark isn't nearly as sophisticated as yours. I reran the tests using your code and am getting similar results, albeit my rate is much slower :(.

    Following BorisZ's comment, I placed the hash setup outside the cmpthese and indeed it is coming out as the speed demon. Interesting, I tried doing the same with the my %bucket line in your code and the performance decreased dramatically. I guess that using local variables provides better performance than globals?

    Thanks again for the insights!

    -Wm
      Referring to lexical variables from within a Benchmark is the mistake made most often when benchmarking. And a very large percentage of the benchmarks I've seen in the past years make this mistake. It has prompted me to give a talk about this (and other mistakes) at YAPC::NA last year.

      Moving my %h from BorisZ code to the main program is wrong. Moving my %bucket from my code to the main program is wrong (although for a different reason!).

      Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://330590]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (6)
As of 2024-04-18 02:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found