http://qs321.pair.com?node_id=1197294


in reply to High Performance Game of Life

In Fastest way to lookup a point in a set we concluded, in isolation, that hash lookups were faster using split/join rather than pack/unpack.

With nothing to lose though, I tried changing split /:/, $z to unpack 'ii', $z and

join ':', $x - 1, $y - 1 join ':', @_
to:
pack 'ii', $x - 1, $y - 1 pack 'ii', @_
... and almost fell off my chair when the run time for three million cells dropped from 287 seconds way down to 204 seconds!! What gives?

As shown below, there is no difference in the number of op codes:

> perl -MO=Terse -e "split /:/, $z" LISTOP (0x2bf88f8) leave [1] OP (0x25b59a0) enter COP (0x2bf8940) nextstate LISTOP (0x25b59d8) split [2] PMOP (0x25b5a60) pushre UNOP (0x25b5a20) null [15] PADOP (0x25b5ad0) gvsv GV (0x25b0660) *z SVOP (0x25b5938) const [3] IV (0x25b0d80) 0 > perl -MO=Terse -e "unpack 'ii', $z" LISTOP (0x2c387f8) leave [1] OP (0x645908) enter COP (0x645940) nextstate LISTOP (0x6459d8) unpack OP (0x6459a0) null [3] SVOP (0x645aa0) const [2] PV (0x63fd40) "ii" UNOP (0x645a20) null [15] PADOP (0x645a60) gvsv GV (0x63fe00) *z > perl -MO=Terse -e "join ':', @_" LISTOP (0x2a9d798) leave [1] OP (0x24f5938) enter COP (0x24f5970) nextstate LISTOP (0x24f5a08) join [3] OP (0x24f59d0) pushmark SVOP (0x24f5ad0) const [4] PV (0x24f0960) ":" UNOP (0x24f5a50) rv2av [2] PADOP (0x24f5a90) gv GV (0xe9b2c8) *_ > perl -MO=Terse -e "pack 'ii', @_" LISTOP (0x2c4e318) leave [1] OP (0x645938) enter COP (0x645970) nextstate LISTOP (0x645a08) pack [3] OP (0x6459d0) pushmark SVOP (0x645ad0) const [4] PV (0x640a80) "ii" UNOP (0x645a50) rv2av [2] PADOP (0x645a90) gv GV (0xffb2c8) *_

There didn't appear to be any significant difference in memory consumption either.

Anyone got any ideas? The slowdown may be caused by split using a /:/ regex - and regexes are slow. Note that in Fastest way to lookup a point in a set, we were measuring lookups in isolation and lookups use join, not split. How to investigate further? Devel::NYTProf?

Update:: From running:

for my $r ( [-123456789, 987654321], [1,2] ) { my $pp = pack 'ii', @{$r}; my $jj = join ':', @{$r}; my $pplen = length $pp; my $jjlen = length $jj; print "$r->[0]:$r->[1] packlen=$pplen joinlen=$jjlen\n"; my ($xpp, $ypp) = unpack 'ii', $pp; my ($xjj, $yjj) = split /:/, $jj; $xpp == $r->[0] or die; $ypp == $r->[1] or die; $xjj == $r->[0] or die; $yjj == $r->[1] or die; }
we see:
-123456789:987654321 packlen=8 joinlen=20 1:2 packlen=8 joinlen=3
That is, pack/unpack always has a hash key length of 8 bytes, while with split/join the key length varies, depending on the size of the x and y coordinates.

Update: As discovered by marioroy, 'i2' is faster than 'ii' in pack and unpack.