Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^4: Rosetta Code: Long List is Long -- dualvar

by eyepopslikeamosquito (Bishop)
on Dec 04, 2022 at 06:35 UTC ( #11148545=note: print w/replies, xml ) Need Help??


in reply to Re^3: Rosetta Code: Long List is Long -- dualvar
in thread Rosetta Code: Long List is Long

What really surprised me was how slow it became once I tried to sort the data just once

Agreed! Not sure if this is a bug or a feature but I found it very surprising! On my machine, when testing Discipulus' superb one liner just now:

perl -lae "$r{$F[0]}+=$F[1]}{print qq($_\t$r{$_})for sort{$r{$b}<=>$r{ +$a}||$a cmp $b}keys %r" big1.txt big2.txt big3.txt >oneliner.tmp

I almost fell off my chair when the execution time dropped from 92 seconds to 38 seconds (with identical correct output) simply by changing:

sort{$r{$b}<=>$r{$a}||$a cmp $b}keys %r
to:
sort{$r{$b}<=>$r{$a}}sort{$a cmp $b}keys %r

After picking myself up off the floor, I ran:

perl -MO=Terse -e "sort{$r{$b}<=>$r{$a}||$a cmp $b}keys %r" LISTOP (0x299c080) leave [1] OP (0x299c050) enter COP (0x299c0c0) nextstate LISTOP (0x299c120) sort OP (0x299c160) pushmark UNOP (0x2882ed0) null LISTOP (0x2882fb0) scope COP (0x2882ff0) null [195] UNOP (0x2883050) null LOGOP (0x2883088) or BINOP (0x28831e8) ncmp [5] UNOP (0x26610d0) null [150] UNOP_AUX (0x299c010) multideref OP (0x26611b8) null [7] UNOP (0x2883228) null [150] UNOP_AUX (0x299bfd0) multideref OP (0x2661098) null [7] BINOP (0x28830c8) scmp [8] UNOP (0x2883178) null [14] PADOP (0x28831b0) gvsv GV (0x18a8c0) +*a UNOP (0x2883108) null [14] PADOP (0x2883140) gvsv GV (0x2655320) + *b UNOP (0x2882f08) keys [11] UNOP (0x2882f40) rv2hv [10] PADOP (0x2882f78) gv GV (0x2654ae0) *r

which is 25 OPs ... while:

perl -MO=Terse -e "sort{$r{$b}<=>$r{$a}}sort{$a cmp $b}keys %r" LISTOP (0x2940c18) leave [1] OP (0x2940c90) enter COP (0x2940bb8) nextstate LISTOP (0x2940b78) sort OP (0x2940c58) pushmark UNOP (0x2990018) null LISTOP (0x2940d38) scope COP (0x2940d78) null [195] BINOP (0x2940dd8) ncmp [5] UNOP (0x2644940) null [150] UNOP_AUX (0x298ff68) multideref OP (0x2644a28) null [7] UNOP (0x2940e18) null [150] UNOP_AUX (0x298ff28) multideref OP (0x2644908) null [7] LISTOP (0x298ffa8) sort OP (0x298ffe8) pushmark UNOP (0x2940ad0) keys [11] UNOP (0x2940b08) rv2hv [10] PADOP (0x2940b40) gv GV (0x26344d8) *r

and (update):

perl -MO=Terse -e "sort{$r{$b}<=>$r{$a}}sort keys %r" LISTOP (0x29e2518) leave [1] OP (0x29bda60) enter COP (0x29e2558) nextstate LISTOP (0x29e25b8) sort OP (0x29e25f8) pushmark UNOP (0x29e2628) null LISTOP (0x29e2778) scope COP (0x29e27b8) null [195] BINOP (0x29e2818) ncmp [5] UNOP (0xed31b0) null [150] UNOP_AUX (0x29bda20) multideref OP (0xed3298) null [7] UNOP (0x29e2858) null [150] UNOP_AUX (0x29bd9e0) multideref OP (0xed3178) null [7] LISTOP (0x29e2660) sort OP (0x29e26a0) pushmark UNOP (0x29e26d0) keys [8] UNOP (0x29e2708) rv2hv [7] PADOP (0x29e2740) gv GV (0xec4098) *r

are just 20 OPs. I'm out of my depth here, so we might need a dave_the_m-class Perl internals guru to get to the bottom of this mystery. :)

Update: I believe the second example above can be shortened from:

sort{$r{$b}<=>$r{$a}}sort{$a cmp $b}keys %r
to simply:
sort{$r{$b}<=>$r{$a}}sort keys %r
...also, for correctness, I think we should further add:
use sort 'stable';
at the top of the program because we are relying on a stable sort for this trick to work. Further update: Oops, this is not necessary, didn't read far enough in the sort docs, "The default sort has been stable since v5.8.0, and given this consistent behaviour for almost two decades, everyone has come to assume stability".

Updated: fixed typo, added -e after -MO=Terse in examples above. Added comment about stable sort.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148545]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2023-01-31 10:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?