Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^3: Rosetta Code: Long List is Long (Updated Solutions - dualvar)

by eyepopslikeamosquito (Archbishop)
on Dec 05, 2022 at 22:28 UTC ( [id://11148585]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Rosetta Code: Long List is Long (Updated Solutions)
in thread Rosetta Code: Long List is Long

I noticed it, but (wrongly) assumed applying its remarkable two sort trick to my original solution would have the same effect.

For completeness, I created llil2d.pl (shown below) by applying your dualvar array trick to my original llil2.pl two-sort solution, with minimal changes. I can confirm that it is indeed about 3 seconds faster and with slightly lower memory use. Despite using Perl for 20 years, I'd never heard of dualvar before (update: oops, turns out I had :). Huge kudos to marioroy for unearthing this!

llil2d start get_properties : 11 secs sort + output : 22 secs total : 33 secs Memory use (Windows Private Bytes): 2,824,184K (slightly lower than 2,896,104K for llil2.pl)

For completeness, here is my adjusted llil2d.pl:

# llil2d.pl. Remarkable dualvar version based on [marioroy]'s concocti +on. # Example run: perl llil2d.pl tt1.txt tt2.txt tt3.txt >out.txt use strict; use warnings; use feature qw{say}; use Scalar::Util qw{dualvar}; # -------------------------------------------------------------------- +-- # LLiL specification # ------------------ # A LLiL-format file is a text file. # Each line consists of a lowercase name a TAB character and a non-neg +ative integer count. # That is, each line must match : ^[a-z]+\t\d+$ # For example, reading the LLiL-format files, tt1.txt containing: # camel\t42 # pearl\t94 # dromedary\t69 # and tt2.txt containing: # camel\t8 # hello\t12345 # dromedary\t1 # returns this hashref: # $hash_ret{"camel"} = 50 # $hash_ret{"dromedary"} = 70 # $hash_ret{"hello"} = 12345 # $hash_ret{"pearl"} = 94 # That is, values are added for items with the same key. # # To get the required LLiL text, you must sort the returned hashref # descending by value and insert a TAB separator: # hello\t12345 # pearl\t94 # dromedary\t70 # camel\t50 # To make testing via diff easier, we further sort ascending by name # for lines with the same value. # -------------------------------------------------------------------- +-- # Function get_properties # Read a list of LLiL-format files # Return a reference to a hash of properties sub get_properties { my $files = shift; # in: reference to a list of LLiL-format fil +es my %hash_ret; # out: reference to a hash of properties for my $fname ( @{$files} ) { open( my $fh, '<', $fname ) or die "error: open '$fname': $!"; while (<$fh>) { chomp; my ($word, $count) = split /\t/; $hash_ret{$word} += $count; } close($fh) or die "error: close '$fname': $!"; } return \%hash_ret; } # ----------------- mainline ----------------------------------------- +-- @ARGV or die "usage: $0 file...\n"; my @llil_files = @ARGV; warn "llil2d start\n"; my $tstart1 = time; my $href = get_properties( \@llil_files ); my $tend1 = time; my $taken1 = $tend1 - $tstart1; warn "get_properties : $taken1 secs\n"; my $tstart2 = time; my @data; while ( my ($k, $v) = each %{$href} ) { push @data, dualvar($v, $k) } # Using two sorts is waaay faster than one! (see [id://11148545]) for my $key ( sort { $b <=> $a } sort @data ) { say "$key\t" . (0 + $key); } my $tend2 = time; my $taken2 = $tend2 - $tstart2; my $taken = $tend2 - $tstart1; warn "sort + output : $taken2 secs\n"; warn "total : $taken secs\n";

Update: llil2grt.pl is about three seconds faster than llil2d.pl above, while using slightly less memory.

References Added Later

Dualvar:

Some ideas to try in the future:

See also:

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11148585]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2024-03-28 15:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found