Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Substring Sort

by MeowChow (Vicar)
on Jun 01, 2001 at 02:24 UTC ( #84771=note: print w/replies, xml ) Need Help??


in reply to Substring Sort

Boredom == Benchmarks.

tye's routine wins overall, however, arhuman's does scale better with huge lists. I also tossed in a method of my own, based on tye's approach, which I was curious about. Not too shabby, but it scales rather poorly.

use strict; use warnings; use Benchmark qw(cmpthese); sub tyesort { @_[ do { my @by = map { (split '\|:\|', $_, 2)[1] } @_; sort {$by[$a] cmp $by[$b]} 0..$#_ } ]; } sub mcsort { @_[ do { my %by; @by{ map { (split '\|:\|', $_, 2)[1] } @_ } = 0..$#_; @by{ sort keys %by }; } ]; } sub ahsort { map { my ($n,$t) = split '\|:\|', $_, 2; $t.'|:|'.$n } sort map { my ($n,$t) = split '\|:\|', $_, 2; $t.'|:|'.$n } @_; } sub stsort { map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { my ($n,$t) = split('\|:\|',$_,2); [$_,$t] } @_; } for my $num (10, 25, 100, 250, 1000, 10000) { my $str = "aaaaa"; my @list = map { "$_|:|".$str++ } 0..$num; for (0..$#list) { my $i = int rand $_; @list[$_, $i] = @list[$i, $_]; } print "Comparing for $num elements\n"; cmpthese (-3, { ah => sub { ahsort(@list); }, mc => sub { mcsort(@list); }, st => sub { stsort(@list); }, tye => sub { tyesort(@list); } }); } #### RESULTS #### Comparing for 10 elements Rate st ah mc tye st 9664/s -- -10% -39% -40% ah 10762/s 11% -- -32% -33% mc 15789/s 63% 47% -- -2% tye 16049/s 66% 49% 2% -- Comparing for 25 elements Rate st ah tye mc st 3744/s -- -17% -40% -47% ah 4502/s 20% -- -28% -37% tye 6255/s 67% 39% -- -12% mc 7110/s 90% 58% 14% -- Comparing for 100 elements Rate st ah mc tye st 820/s -- -28% -37% -39% ah 1143/s 39% -- -12% -16% mc 1300/s 58% 14% -- -4% tye 1354/s 65% 18% 4% -- Comparing for 250 elements Rate st ah tye mc st 288/s -- -36% -39% -39% ah 451/s 56% -- -4% -5% tye 469/s 63% 4% -- -2% mc 476/s 65% 6% 2% -- Comparing for 1000 elements Rate st tye mc ah st 58.1/s -- -37% -40% -46% tye 92.9/s 60% -- -4% -13% mc 97.0/s 67% 4% -- -10% ah 107/s 85% 15% 11% -- Comparing for 10000 elements Rate mc st tye ah mc 1.39/s -- -59% -75% -82% st 3.38/s 142% -- -39% -56% tye 5.53/s 297% 64% -- -28% ah 7.70/s 452% 128% 39% --
BIG UPDATE: Of course, you could do things the straightforward way, as myocom has suggested, and only perform about 100 - 1000 times faster (I also made a slight change to my version which made it the fastest of those previously compared, but I suppose that doesn't matter a bit):
## changed sub mcsort { @_[ do { my %by; my @keys = map { (split '\|:\|', $_, 2)[1] } @_; @by{@keys} = 0..$#_; @by{sort @keys}; } ]; } ## added sub myosort { sort { (split '\|:\|', $a, 2)[1] cmp (split '\|:\|', $b, 2)[1] } @_; } ## NEW RESULTS ## Comparing for 10 elements Rate st ah tye mc myo st 8924/s -- -2% -42% -44% -99% ah 9098/s 2% -- -41% -43% -99% tye 15315/s 72% 68% -- -4% -98% mc 15904/s 78% 75% 4% -- -98% myo 653579/s 7224% 7084% 4168% 4010% -- Comparing for 25 elements Rate st ah tye mc myo st 3416/s -- -11% -43% -51% -99% ah 3853/s 13% -- -36% -45% -99% tye 5990/s 75% 55% -- -14% -99% mc 6950/s 103% 80% 16% -- -99% myo 558070/s 16236% 14382% 9217% 7930% -- Comparing for 100 elements Rate st ah tye mc myo st 748/s -- -23% -41% -58% -100% ah 967/s 29% -- -24% -46% -100% tye 1264/s 69% 31% -- -29% -100% mc 1789/s 139% 85% 42% -- -99% myo 323567/s 43138% 33372% 25505% 17984% -- Comparing for 250 elements Rate st ah tye mc myo st 265/s -- -30% -38% -62% -100% ah 376/s 42% -- -12% -46% -100% tye 426/s 61% 13% -- -38% -100% mc 692/s 162% 84% 63% -- -100% myo 164515/s 62088% 43687% 38543% 23666% -- Comparing for 1000 elements Rate st tye ah mc myo st 52.9/s -- -37% -42% -64% -100% tye 84.1/s 59% -- -7% -43% -100% ah 90.9/s 72% 8% -- -38% -100% mc 147/s 177% 74% 61% -- -100% myo 48711/s 92007% 57796% 53482% 33147% -- Comparing for 2500 elements Rate st tye ah mc myo st 17.6/s -- -38% -47% -61% -100% tye 28.5/s 62% -- -14% -36% -100% ah 33.3/s 89% 17% -- -26% -100% mc 44.8/s 154% 57% 34% -- -100% myo 18884/s 107077% 66153% 56551% 42040% -- Comparing for 10000 elements Rate st tye ah mc myo st 3.13/s -- -35% -55% -57% -100% tye 4.81/s 54% -- -31% -34% -100% ah 6.93/s 122% 44% -- -5% -100% mc 7.28/s 133% 51% 5% -- -100% myo 3842/s 122850% 79817% 55337% 52666% --
Yet Another Update: I just realized that my little mcsort routine doesn't really work (it has "issues" with duplicate sort keys, to put it mildly). This doesn't affect the other results, however.
   MeowChow                                   
               s aamecha.s a..a\u$&owag.print

Replies are listed 'Best First'.
(tye)Re3: Substring Sort
by tye (Sage) on Jun 01, 2001 at 07:30 UTC

    When I see such a huge disparity in benchmarks, the first thing I think is "oops, implemented one of them wrong".

    I think the problem here is that myocom's sort is the only one where the return value from the subroutine is also the return value from sort, which means that when that function is called in a scalar context, sort ends up in a scalar context which causes it to just return undef and do no work.

    I added @_= to the subs passed to Benchmark and got these results:

      Thanks for pointing that out! I knew something was wrong with those numbers, but for the life of me, couldn't figure out what. Tossing in a print statement to check the return value obviously didn't help things :-)

      Regarding my sort routine, if you run my updated version (your scores look like the original) on a Perl 5.6.1, you should find that it's the fastest until about 10,000 elements, at which point arhuman's begins to take the lead:

      Comparing for 1000 elements Rate myo st tye ah mc myo 18.4/s -- -61% -72% -78% -83% st 46.9/s 154% -- -29% -43% -58% tye 66.5/s 261% 42% -- -20% -40% ah 82.7/s 348% 76% 24% -- -25% mc 111/s 500% 136% 66% 34% -- Comparing for 2500 elements Rate myo st tye ah mc myo 7.46/s -- -54% -69% -76% -78% st 16.3/s 118% -- -32% -47% -51% tye 24.0/s 222% 48% -- -22% -28% ah 31.0/s 315% 90% 29% -- -8% mc 33.5/s 349% 106% 39% 8% -- Comparing for 10000 elements Rate myo st tye mc ah myo 1.36/s -- -53% -68% -76% -78% st 2.88/s 112% -- -33% -49% -55% tye 4.29/s 214% 49% -- -25% -32% mc 5.69/s 317% 97% 33% -- -10% ah 6.34/s 365% 120% 48% 12% --
      Perl 5.6 gives rather different results though...
         MeowChow                                   
                     s aamecha.s a..a\u$&owag.print
Re: Re: Substring Sort
by myocom (Deacon) on Jun 01, 2001 at 03:22 UTC

    Correct me if I'm wrong, but I believe the original poster wanted to sort those strings by a substring (rather than just sorting the substring)? So while you folks are going about your fancy, big city mapping, you're throwing away part of the data he wanted. :-)

    Update: Say, they do get reconstructed, don't they. Still seems like an awful lot of code to get what he wanted, though...

    Update part deux: Woohoo! Thanks for adding me into the benchmark. :-)

      If you look at the code, you'll see that the original strings get reconstructed or re-referenced in each case.
         MeowChow                                   
                     s aamecha.s a..a\u$&owag.print

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://84771]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2020-07-07 06:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?