Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

What really surprised me was how slow it became once I tried to sort the data just once

Agreed! Not sure if this is a bug or a feature but I found it very surprising! On my machine, when testing Discipulus' superb one liner just now:

perl -lae "$r{$F[0]}+=$F[1]}{print qq($_\t$r{$_})for sort{$r{$b}<=>$r{ +$a}||$a cmp $b}keys %r" big1.txt big2.txt big3.txt >oneliner.tmp

I almost fell off my chair when the execution time dropped from 92 seconds to 38 seconds (with identical correct output) simply by changing:

sort{$r{$b}<=>$r{$a}||$a cmp $b}keys %r
to:
sort{$r{$b}<=>$r{$a}}sort{$a cmp $b}keys %r

After picking myself up off the floor, I ran:

perl -MO=Terse -e "sort{$r{$b}<=>$r{$a}||$a cmp $b}keys %r" LISTOP (0x299c080) leave [1] OP (0x299c050) enter COP (0x299c0c0) nextstate LISTOP (0x299c120) sort OP (0x299c160) pushmark UNOP (0x2882ed0) null LISTOP (0x2882fb0) scope COP (0x2882ff0) null [195] UNOP (0x2883050) null LOGOP (0x2883088) or BINOP (0x28831e8) ncmp [5] UNOP (0x26610d0) null [150] UNOP_AUX (0x299c010) multideref OP (0x26611b8) null [7] UNOP (0x2883228) null [150] UNOP_AUX (0x299bfd0) multideref OP (0x2661098) null [7] BINOP (0x28830c8) scmp [8] UNOP (0x2883178) null [14] PADOP (0x28831b0) gvsv GV (0x18a8c0) +*a UNOP (0x2883108) null [14] PADOP (0x2883140) gvsv GV (0x2655320) + *b UNOP (0x2882f08) keys [11] UNOP (0x2882f40) rv2hv [10] PADOP (0x2882f78) gv GV (0x2654ae0) *r

which is 25 OPs ... while:

perl -MO=Terse -e "sort{$r{$b}<=>$r{$a}}sort{$a cmp $b}keys %r" LISTOP (0x2940c18) leave [1] OP (0x2940c90) enter COP (0x2940bb8) nextstate LISTOP (0x2940b78) sort OP (0x2940c58) pushmark UNOP (0x2990018) null LISTOP (0x2940d38) scope COP (0x2940d78) null [195] BINOP (0x2940dd8) ncmp [5] UNOP (0x2644940) null [150] UNOP_AUX (0x298ff68) multideref OP (0x2644a28) null [7] UNOP (0x2940e18) null [150] UNOP_AUX (0x298ff28) multideref OP (0x2644908) null [7] LISTOP (0x298ffa8) sort OP (0x298ffe8) pushmark UNOP (0x2940ad0) keys [11] UNOP (0x2940b08) rv2hv [10] PADOP (0x2940b40) gv GV (0x26344d8) *r

and (update):

perl -MO=Terse -e "sort{$r{$b}<=>$r{$a}}sort keys %r" LISTOP (0x29e2518) leave [1] OP (0x29bda60) enter COP (0x29e2558) nextstate LISTOP (0x29e25b8) sort OP (0x29e25f8) pushmark UNOP (0x29e2628) null LISTOP (0x29e2778) scope COP (0x29e27b8) null [195] BINOP (0x29e2818) ncmp [5] UNOP (0xed31b0) null [150] UNOP_AUX (0x29bda20) multideref OP (0xed3298) null [7] UNOP (0x29e2858) null [150] UNOP_AUX (0x29bd9e0) multideref OP (0xed3178) null [7] LISTOP (0x29e2660) sort OP (0x29e26a0) pushmark UNOP (0x29e26d0) keys [8] UNOP (0x29e2708) rv2hv [7] PADOP (0x29e2740) gv GV (0xec4098) *r

are just 20 OPs. I'm out of my depth here, so we might need a dave_the_m-class Perl internals guru to get to the bottom of this mystery. :)

Update: I believe the second example above can be shortened from:

sort{$r{$b}<=>$r{$a}}sort{$a cmp $b}keys %r
to simply:
sort{$r{$b}<=>$r{$a}}sort keys %r
...also, for correctness, I think we should further add:
use sort 'stable';
at the top of the program because we are relying on a stable sort for this trick to work. Further update: Oops, this is not necessary, didn't read far enough in the sort docs, "The default sort has been stable since v5.8.0, and given this consistent behaviour for almost two decades, everyone has come to assume stability".

Updated: fixed typo, added -e after -MO=Terse in examples above. Added comment about stable sort.


In reply to Re^4: Rosetta Code: Long List is Long -- dualvar by eyepopslikeamosquito
in thread Rosetta Code: Long List is Long by eyepopslikeamosquito

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-04-19 06:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found