llil2cmd.pl - abbreviated version of llil2grt.pl
For cheap thrills, I created llil2cmd.pl, a short command line version of llil2grt.pl:
#!perl -n
# llil2cmd.pl. Abbreviated version of llil2grt.pl.
chomp; ($w,$c) = split/\t/; $h{$w} += $c;
END {
$\=$/;
push @l, pack('NA*',-$v,"$k\t$v") while ($k,$v)=each %h;
print substr($_,4) for sort @l;
}
Curiously, this abbreviated version runs at about the same speed on Windows, but significantly
faster on my Ubuntu Linux VM:
> time perl llil2grt.pl big1.txt big2.txt big3.txt >grt1.tmp
llil2grt start
get_properties : 8 secs
sort + output : 22 secs
total : 30 secs
real 0m33.475s
user 0m32.180s
sys 0m1.295s
> time perl llil2cmd.pl big1.txt big2.txt big3.txt >cmd1.tmp
real 0m28.937s
user 0m27.843s
sys 0m1.093s
> diff cmd1.tmp grt1.tmp
To get more detailed timings, I hacked out a long short version:
#!perl -n
# llil2cmd-long.pl. A long short version of llil2grt.pl.
BEGIN {
$tstart1 = time;
}
chomp; ($w,$c) = split/\t/; $h{$w} += $c;
END {
my $tstart2 = time;
$\=$/;
push @l, pack('NA*',-$v,"$k\t$v") while ($k,$v)=each %h;
print substr($_,4) for sort @l;
my $tend2 = time;
my $taken1 = $tstart2 - $tstart1;
my $taken2 = $tend2 - $tstart2;
my $taken = $tend2 - $tstart1;
warn "get_properties : $taken1 secs\n";
warn "sort + output : $taken2 secs\n";
warn "total : $taken secs\n";
}
$ time perl llil2cmd-long.pl big1.txt big2.txt big3.txt >long1.tmp
get_properties : 7 secs
sort + output : 21 secs
total : 28 secs
real 0m28.629s
user 0m27.707s
sys 0m0.917s
> diff long1.tmp grt1.tmp
As you can see from the times reported by the Linux time command,
it seems that large lexical variables in Perl are significantly slower to cleanup
at program exit than non-lexicals (about three seconds slower in this example:
33.475s vs 30 secs for llil2grt.pl, 28.629s vs 28 secs for llil2cmd-long.pl).
New perl 5.36 experimental for_list feature
After stumbling upon perl 5.36 and the for_list feature - a simple speed comparison I had to give the perl 5.36 for_list feature a try
(update: List::Util's pairmap might be worth a try given it was mentioned in a reply).
After building perl v5.36 from source (my Ubuntu system perl is v5.34 - update see improved build perl 5.38 notes):
wget https://www.cpan.org/src/5.0/perl-5.36.0.tar.gz
(update: run sha256sum perl-5.36.0.tar.gz and check matches https://ww
+w.cpan.org/src/5.0/perl-5.36.0.tar.gz.sha256.txt)
tar -xzf perl-5.36.0.tar.gz
cd perl-5.36.0
./Configure -des -Dprefix=$HOME/localperl
make 2>&1 | tee make.tmp
make test 2>&1 | tee test.tmp
make install 2>&1 | tee install.tmp
and adding:
use 5.036;
use experimental qw/for_list declared_refs/;
to the top of llil2grt.pl while changing one line from:
while (my ($k, $v) = each %{$href}) { push @lines, pack('NA*', -$v, "$
+k\t$v") }
to:
for my ($k, $v) (%{$href}) { push @lines, pack('NA*', -$v, "$k\t$v") }
it produced the same result, but did not run appreciably faster.
Update: as for why it isn't much faster, see ikegami's replies at: Re^2: Why does each() always re-evaluate its argument? (Updated x2 - experimental "for_list" )
Update: Improved Ubuntu Perl Build Notes
Manual install of CPAN Roman module
Later I manually installed Roman by CHORNY from CPAN into this local non-root Perl 5.36 as follows:
$ cd $HOME/localperlmodules
$ type perl
perl is hashed ($HOME/localperl/bin/perl)
$ wget https://www.cpan.org/modules/by-module/Roman/Roman-1.24.tar.gz
$ tar -xzf Roman-1.24.tar.gz
$ cd Roman-1.24
$ perl Makefile.PL 2>&1 | tee make.tmp
$ make 2>&1 | tee make.tmp
$ make test 2>&1 | tee test.tmp
$ make install 2>&1 | tee install.tmp
Update: Better to do it via: cpanm --from https://www.cpan.org/ --verify Roman 2>&1 | tee Roman.tmp
Updated: Added steps for building perl v5.36.0 from source and manual install of Roman module.
Noted that large lexical variables are slower to cleanup at program exit.
|