Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^2: Rosetta Code: Long List is Long -- dualvar

by marioroy (Prior)
on Dec 02, 2022 at 04:07 UTC ( [id://11148488]=note: print w/replies, xml ) Need Help??


in reply to Re: Rosetta Code: Long List is Long
in thread Rosetta Code: Long List is Long

The following is my fun spin using dualvar, based on choroba.pl. I tried minimizing memory consumption by using one hash and one array. The memory consumption is similar to llil.pl and performance slightly faster than choroba.pl.

See also, parallel solution update.

#!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use Scalar::Util qw{ dualvar }; warn "start\n"; my $tstart1 = time; our (%by_word, @data); while (<>) { chomp; my ($k, $v) = split /\t/, $_; $by_word{$k} += $v; } my $tend1 = time; warn "get properties: ", $tend1 - $tstart1, " secs\n"; my $tstart2 = time; while (my ($k, $v) = each %by_word) { push @data, dualvar($v, $k); } # output array of dualvars; sorted by number desc, string asc say "$_\t".(0+$_) for sort { $b <=> $a } sort { $a cmp $b } @data; my $tend2 = time; warn "sort + output: ", $tend2 - $tstart2, " secs\n"; warn "total: ", $tend2 - $tstart1, " secs\n";

Replies are listed 'Best First'.
Re^3: Rosetta Code: Long List is Long -- dualvar
by choroba (Cardinal) on Dec 02, 2022 at 11:20 UTC
    Interesting!

    What really surprised me was how slow it became once I tried to sort the data just once:

    sort { $b <=> $a || $a cmp $b } @data;
    Before:
    sort + output: 19 secs
    After:
    sort + output: 32 secs

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      What really surprised me was how slow it became once I tried to sort the data just once

      Agreed! Not sure if this is a bug or a feature but I found it very surprising! On my machine, when testing Discipulus' superb one liner just now:

      perl -lae "$r{$F[0]}+=$F[1]}{print qq($_\t$r{$_})for sort{$r{$b}<=>$r{ +$a}||$a cmp $b}keys %r" big1.txt big2.txt big3.txt >oneliner.tmp

      I almost fell off my chair when the execution time dropped from 92 seconds to 38 seconds (with identical correct output) simply by changing:

      sort{$r{$b}<=>$r{$a}||$a cmp $b}keys %r
      to:
      sort{$r{$b}<=>$r{$a}}sort{$a cmp $b}keys %r

      After picking myself up off the floor, I ran:

      perl -MO=Terse -e "sort{$r{$b}<=>$r{$a}||$a cmp $b}keys %r" LISTOP (0x299c080) leave [1] OP (0x299c050) enter COP (0x299c0c0) nextstate LISTOP (0x299c120) sort OP (0x299c160) pushmark UNOP (0x2882ed0) null LISTOP (0x2882fb0) scope COP (0x2882ff0) null [195] UNOP (0x2883050) null LOGOP (0x2883088) or BINOP (0x28831e8) ncmp [5] UNOP (0x26610d0) null [150] UNOP_AUX (0x299c010) multideref OP (0x26611b8) null [7] UNOP (0x2883228) null [150] UNOP_AUX (0x299bfd0) multideref OP (0x2661098) null [7] BINOP (0x28830c8) scmp [8] UNOP (0x2883178) null [14] PADOP (0x28831b0) gvsv GV (0x18a8c0) +*a UNOP (0x2883108) null [14] PADOP (0x2883140) gvsv GV (0x2655320) + *b UNOP (0x2882f08) keys [11] UNOP (0x2882f40) rv2hv [10] PADOP (0x2882f78) gv GV (0x2654ae0) *r

      which is 25 OPs ... while:

      perl -MO=Terse -e "sort{$r{$b}<=>$r{$a}}sort{$a cmp $b}keys %r" LISTOP (0x2940c18) leave [1] OP (0x2940c90) enter COP (0x2940bb8) nextstate LISTOP (0x2940b78) sort OP (0x2940c58) pushmark UNOP (0x2990018) null LISTOP (0x2940d38) scope COP (0x2940d78) null [195] BINOP (0x2940dd8) ncmp [5] UNOP (0x2644940) null [150] UNOP_AUX (0x298ff68) multideref OP (0x2644a28) null [7] UNOP (0x2940e18) null [150] UNOP_AUX (0x298ff28) multideref OP (0x2644908) null [7] LISTOP (0x298ffa8) sort OP (0x298ffe8) pushmark UNOP (0x2940ad0) keys [11] UNOP (0x2940b08) rv2hv [10] PADOP (0x2940b40) gv GV (0x26344d8) *r

      and (update):

      perl -MO=Terse -e "sort{$r{$b}<=>$r{$a}}sort keys %r" LISTOP (0x29e2518) leave [1] OP (0x29bda60) enter COP (0x29e2558) nextstate LISTOP (0x29e25b8) sort OP (0x29e25f8) pushmark UNOP (0x29e2628) null LISTOP (0x29e2778) scope COP (0x29e27b8) null [195] BINOP (0x29e2818) ncmp [5] UNOP (0xed31b0) null [150] UNOP_AUX (0x29bda20) multideref OP (0xed3298) null [7] UNOP (0x29e2858) null [150] UNOP_AUX (0x29bd9e0) multideref OP (0xed3178) null [7] LISTOP (0x29e2660) sort OP (0x29e26a0) pushmark UNOP (0x29e26d0) keys [8] UNOP (0x29e2708) rv2hv [7] PADOP (0x29e2740) gv GV (0xec4098) *r

      are just 20 OPs. I'm out of my depth here, so we might need a dave_the_m-class Perl internals guru to get to the bottom of this mystery. :)

      Update: I believe the second example above can be shortened from:

      sort{$r{$b}<=>$r{$a}}sort{$a cmp $b}keys %r
      to simply:
      sort{$r{$b}<=>$r{$a}}sort keys %r
      ...also, for correctness, I think we should further add:
      use sort 'stable';
      at the top of the program because we are relying on a stable sort for this trick to work. Further update: Oops, this is not necessary, didn't read far enough in the sort docs, "The default sort has been stable since v5.8.0, and given this consistent behaviour for almost two decades, everyone has come to assume stability".

      Updated: fixed typo, added -e after -MO=Terse in examples above. Added comment about stable sort.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148488]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-03-28 20:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found