More Betterer Game of Life

Without good design, good algorithms, and complete understanding of the program's operation, your carefully optimized code will amount to one of mankind's least fruitful creations - a fast slow program.

-- Michael Abrash

Don't diddle code to make it faster -- find a better algorithm

-- The Elements of Programming Style

In High Performance Game of Life, I chose a very simple design, storing all live cells in a single set. Though pleasing for its simplicity and unboundedness, its drawback is that counting live neighbours becomes a hash lookup, a chronic performance bottleneck. What to do?

Rather than spending more time optimizing my original design -- thus creating a "fast slow program" -- I researched the domain, learning of many different ways to do it. From the many possible approaches, I chose the simplest one I could find that looked interesting and enjoyable, and implemented it in pure Perl.

To try to keep my initial attempt short and understandable, I started with a simplified version based on the the brilliant works of Adam P. Goucher (apg), tiling the universe with 64 x 64 tiles in a conventional way, each tile having eight neighbours. Note that this was chosen for simplicity; more efficient schemes are available, such as the "brick wall" tiling used by Goucher in later versions. For background on the concept of breaking the game of life universe into overlapping tiles, see this description of Life128 and vlife.

My code is loosely based on apgnano (version 2) but advances one tick at a time (rather than two at a time, as apg did) and does not attempt to use universe history. Fair warning though. Despite striving to keep the code simple and short, it's way more complex than my original, Organism.pm swelling from 66 lines of code to 414.

Benchmark Results

I've updated the benchmark results given in my original node. As you can see, even this simplified version, with no attempts made at code optimization, is already an order of magnitude faster than the optimized version of the original.

Version 375K cells 750K cells 1.5 million cells 3 million cells

new Organism.pm (see below) 1 secs 1 secs 3 secs 5 secs

Organism.pm (Mario improvements) 13 secs 26 secs 52 secs 108 secs

Organism.pm (Original) 35 secs 70 secs 141 secs 284 secs

Game::Life::Infinite:Board 37 secs 96 secs 273 secs 905 secs

Version	375K cells	750K cells	1.5 million cells	3 million cells
new Organism.pm (see below)	1 secs	1 secs	3 secs	5 secs
Organism.pm (Mario improvements)	13 secs	26 secs	52 secs	108 secs
Organism.pm (Original)	35 secs	70 secs	141 secs	284 secs
Game::Life::Infinite:Board	37 secs	96 secs	273 secs	905 secs

As for memory use, the maximum Windows Private Bytes used for the three million cell case by each process was:

New Organism.pm (see below): 700,000K - 1,100,000K (update: seems to vary)
Organism.pm (Original): 1,455,004K
Organism.pm (Mario improvements): 1,596,368K
Game::Life::Infinite:Board: 18,138,504K

Benchmark timings running AppleFritter's Lidka test for 30,000 ticks were:

Version Lidka 30,000 ticks

new Organism.pm (see below) 58 secs

Organism.pm (Mario improvements) 450 secs

Organism.pm (Original) 1635 secs

Game::Life::Infinite:Board 640 secs

Version	Lidka 30,000 ticks
new Organism.pm (see below)	58 secs
Organism.pm (Mario improvements)	450 secs
Organism.pm (Original)	1635 secs
Game::Life::Infinite:Board	640 secs

Update

The latest and best Perl implementation: Re^2: More Betterer Game of Life
The latest and best C++ implementation: Re^2: More Betterer Game of Life
See also: Re^2: What's Perl good at or better than Python (Game of Life, LLiL, Rosetta and Performance References) (summary of GOL benchmarks, both Perl and C++)

Improving My Initial Attempt

There is certainly plenty of scope for improving my initial attempt. After all, I have not attempted any optimizations at all, just tried to implement ideas from apg's C++/assembler programs in a pure Perl form in a simple and clear way. While all feedback is welcome, I'm especially eager to see:

Refactorings that make the Perl code shorter, clearer, more idiomatic.
Performance optimizations.
Explanations of (and alternatives to) the bit-twiddling code below, specifically the bit operations in st64_tiletick() below I find especially hard to follow.
Bug fixes. I was shocked when my code worked the second time I ran it - just one coding blunder was corrected before my new Organism.pm passed tgol.t, tgol2.t, tgol3.t and the 30,000 lidka test! So I suspect there may be more bugs lurking in this brand new implementation.

As a minimum, any code refactorings should be tested by running tgol.t and tgol3.t from my original node. Note that this new version of Organism.pm is (or should be) 100% interface compatible with my original.

New Organism.pm

Finally, here is my new and improved Organism.pm (update: the latest and best Organism.pm can be found here):

package Organism;
use strict;

# Note: for this module, perl must be built with 64-bit integers
# use Config;
# $Config{ivsize} < 8 and die "perl ivsize=$Config{ivsize} is too smal
+l";

# ----------------------------------------------------------------
# The Universe is modelled as a set of overlapping tiles.
# For background, see http://conwaylife.com/wiki/Life128_and_vlife
# We use a simple scheme of 64 x 64 tiles (60 x 60 core) with
# conventional tiling (each tile has eight neighbours).
# Note that this was chosen for simplicity; more efficient schemes
# are available, such as the "brick wall" tiling used by Goucher
# in later versions (apgmera, version 3)
#
# This code is loosely based on apgnano (version 2) but advances
# one tick at a time (rather than advancing two at a time)
# and does not attempt to use universe history.
# This was to keep the implementation short.
#
# ----------------------------------------------------------------
# SQUARE TILE
my $BORDER_WIDTH       = 2;
my $BORDER_WIDTH_P1    = $BORDER_WIDTH + 1;
my $TILE_SIZE_FULL     = 64;
my $TILE_SIZE_FULL_M1  = $TILE_SIZE_FULL - 1;
my $TILE_SIZE_FULL_MB  = $TILE_SIZE_FULL - $BORDER_WIDTH;
my $TILE_SIZE_CORE     = $TILE_SIZE_FULL - 2 * $BORDER_WIDTH;
my $TILE_SIZE_CORE_P1  = $TILE_SIZE_CORE + 1;

my $MIDDLE60           = 0x3ffffffffffffffc;
my $LEFT62             = 0xfffffffffffffffc;
my $RIGHT62            = 0x3fffffffffffffff;
my $OUTER4             = 0xc000000000000003;
my $LEFTMIDDLE         = 0x3000000000000000;
my $RIGHTMIDDLE        = 0x000000000000000c;

# Neighbours are numbered clockwise starting with the one directly abo
+ve
my $NUM_NEIGH          = 8;
my $NEIGH_TOP          = 0;
my $NEIGH_TOP_RIGHT    = 1;
my $NEIGH_RIGHT        = 2;
my $NEIGH_BOTTOM_RIGHT = 3;
my $NEIGH_BOTTOM       = 4;
my $NEIGH_BOTTOM_LEFT  = 5;
my $NEIGH_LEFT         = 6;
my $NEIGH_TOP_LEFT     = 7;
# Note that the i ^ 4 trick sets i to the opposite one, that is:
#   0 > 4  (TOP > BOTTOM)
#   1 > 5  (TOP RIGHT > BOTTOM LEFT)
#   2 > 6  (RIGHT > LEFT)
#   3 > 7  (BOTTOM RIGHT > TOP LEFT)
#   4 > 0  (BOTTOM > TOP)
#   5 > 1  (BOTTOM LEFT > TOP RIGHT)
#   6 > 2  (LEFT > RIGHT)
#   7 > 3  (TOP LEFT > BOTTOM RIGHT)

# The functions starting with st64_ manipulate
# a simple 64 x 64 square tile bitmap.
# Note that x and y must be in 0..63 range.
# $row is a ref to an array of 64 unsigned 64-bit ints.
# The value in row[] bitmap is 0 (dead) or 1 (alive).

sub st64_getcellval {
   my ($row, $x, $y) = @_;
   my $mk = 1 << (63 - $x);
   return $row->[$y] & $mk ? 1 : 0;
}

sub st64_setcellval {
   my ($row, $x, $y, $v) = @_;
   my $mk = 1 << (63 - $x);
   if ($v) { $row->[$y] |= $mk; } else { $row->[$y] &= ~$mk; }
}

sub st64_insertcells {
   my $row = shift;
   for my $r (@_) { st64_setcellval($row, $r->[0], $r->[1], 1) }
}

# Used for verification and unit testing of st64_tiletick
sub st64_getlivecells {
   my $row = shift;
   my @cells;
   for my $y (0 .. 63) {
      next unless $row->[$y];
      for my $x (0 .. 63) {
         st64_getcellval($row, $x, $y) and push @cells, [ $x, $y ];
      }
   }
   sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } @cells;
}

# Advance the interior of square tile by one tick.
# Return a two element list:
#   [0] : 1 if square tile changed, else 0.
#   [1] : neighbour flags (see NEIGH flags above)
#         indicates which neighbours need to be updated
sub st64_tiletick {
   my $row = shift;
   my $neigh = 0;
   my $bigdiff = 0;
   my @carry  = (0) x 64;
   my @parity = (0) x 64;
   my @diff   = (0) x 64;
   my ( $aa, $bb, $p, $q, $r, $s, $bit0, $bit1, $bit2 );
   my $top = 0;
   my $bottom = $TILE_SIZE_FULL_M1;
   while ($top < $TILE_SIZE_FULL_M1 && $row->[$top] == 0) { ++$top }
   while ($bottom > 0 && $row->[$bottom] == 0) { --$bottom }
   if ($top > $bottom) { return ( 0, $neigh ) }
   for my $i ($top .. $bottom) {
      $aa = $row->[$i] >> 1;
      $bb = $row->[$i] << 1;
      $q = $aa ^ $bb;
      $parity[$i] = $q ^ $row->[$i];
      $carry[$i] = ($q & $row->[$i]) | ($aa & $bb);
   }
   --$top; ++$bottom;
   if ($top < 1) { $top = 1 }
   if ($bottom > $TILE_SIZE_FULL_MB) { $bottom = $TILE_SIZE_FULL_MB }
   for my $i ($top .. $bottom) {
      $aa = $parity[$i-1];
      $bb = $parity[$i+1];
      $q = $aa ^ $bb;
      $bit0 = $q ^ $parity[$i];
      $r = ($q & $parity[$i]) | ($aa & $bb);

      $aa = $carry[$i-1];
      $bb = $carry[$i+1];
      $q = $aa ^ $bb;
      $p = $q ^ $carry[$i];
      $s = ($q & $carry[$i]) | ($aa & $bb);

      $bit1 = $p ^ $r;
      $bit2 = $s ^ ($p & $r);
      $p = ($bit0 & $bit1 & ~$bit2) | ($bit2 & ~$bit1 & ~$bit0 & $row-
+>[$i]);
      $diff[$i] = ($row->[$i] ^ $p) & $MIDDLE60;
      $bigdiff |= $diff[$i];
      $row->[$i] = ($p & $MIDDLE60) | ($row->[$i] & ~$MIDDLE60);
   }
   $aa = $diff[$BORDER_WIDTH]   | $diff[$BORDER_WIDTH_P1];
   $bb = $diff[$TILE_SIZE_CORE] | $diff[$TILE_SIZE_CORE_P1];
   if ($bigdiff) {
      if ($bigdiff & $LEFTMIDDLE)  { $neigh |= 1 << $NEIGH_LEFT }
      if ($bigdiff & $RIGHTMIDDLE) { $neigh |= 1 << $NEIGH_RIGHT }
   }
   if ($aa) {
      $neigh |= 1 << $NEIGH_TOP;
      if ($aa & $LEFTMIDDLE)  { $neigh |= 1 << $NEIGH_TOP_LEFT }
      if ($aa & $RIGHTMIDDLE) { $neigh |= 1 << $NEIGH_TOP_RIGHT }
   }
   if ($bb) {
      $neigh |= 1 << $NEIGH_BOTTOM;
      if ($bb & $LEFTMIDDLE)  { $neigh |= 1 << $NEIGH_BOTTOM_LEFT }
      if ($bb & $RIGHTMIDDLE) { $neigh |= 1 << $NEIGH_BOTTOM_RIGHT }
   }
   my $changed = ($bigdiff != 0) ? 1 : 0;
   return ( $changed, $neigh );
}

# Population count (https://en.wikipedia.org/wiki/Hamming_weight)
# See also GCC built-in: __builtin_popcount
sub popcount {
   my $x = shift;
   my $count;
   for ($count = 0; $x; ++$count) { $x &= $x - 1 }
   return $count;
}

# ----------------------------------------------------------------
# ORGANISM

sub count {
   my $self  = shift;
   my $tiles = $self->{Tiles};
   my $cnt = 0;
   for my $k (keys %{$tiles}) {
      my $row = $tiles->{$k}->{Row};
      for my $y ($BORDER_WIDTH .. $TILE_SIZE_CORE_P1) {
         next unless $row->[$y];
         $cnt += popcount($row->[$y] & $MIDDLE60);
      }
   }
   return $cnt;
}

# Input a list of [ x, y ] coords
sub insert_cells {
   my $self = shift;
   for my $r (@_) { $self->setcell($r->[0], $r->[1], 1) }
}

# Used for verification and testing the state of the organism
sub get_live_cells {
   my $self  = shift;
   my $tiles = $self->{Tiles};
   my @cells;
   for my $k (keys %{$tiles}) {
      my $sqt = $tiles->{$k};
      for my $y ($BORDER_WIDTH .. $TILE_SIZE_CORE_P1) {
         next unless $sqt->{Row}->[$y];
         for my $x ($BORDER_WIDTH .. $TILE_SIZE_CORE_P1) {
            if (st64_getcellval($sqt->{Row}, $x, $y)) {
               push @cells,
                  [$TILE_SIZE_CORE * $sqt->{Tx} + $x - $BORDER_WIDTH,
                   $TILE_SIZE_CORE * $sqt->{Ty} + $y - $BORDER_WIDTH];
            }
         }
      }
   }
   sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } @cells;
}

sub get_neighbour {
   my $self = shift;
   my $sqt  = shift;
   my $i    = shift;
   unless ($sqt->{Neighbours}->[$i]) {
      my $x = $sqt->{Tx};
      my $y = $sqt->{Ty};
      if ($i >= $NEIGH_TOP_RIGHT    && $i <= $NEIGH_BOTTOM_RIGHT) { ++
+$x }
      if ($i >= $NEIGH_BOTTOM_RIGHT && $i <= $NEIGH_BOTTOM_LEFT)  { ++
+$y }
      if ($i >= $NEIGH_BOTTOM_LEFT  && $i <= $NEIGH_TOP_LEFT)     { --
+$x }
      if ($i == $NEIGH_TOP_LEFT     || $i <= $NEIGH_TOP_RIGHT)    { --
+$y }
      my $tiles = $self->{Tiles};
      my $k = pack 'i2', $x, $y;
      unless (exists $tiles->{$k}) {
         $tiles->{$k} = {
            Row => [ (0) x 64 ],
            Tx => $x,
            Ty => $y,
            Updateflags => 0,
            Neighbours => [],
         };
      }
      $sqt->{Neighbours}->[$i] = $tiles->{$k};
      $sqt->{Neighbours}->[$i]->{Tx} = $x;
      $sqt->{Neighbours}->[$i]->{Ty} = $y;
   }
   return $sqt->{Neighbours}->[$i];
}

# Alert the neighbour that its neighbour (the original tile) has chang
+ed
sub update_neighbour {
   my $self = shift;
   my $sqt  = shift;
   my $i    = shift;
   if ($self->get_neighbour($sqt, $i)->{Updateflags} == 0) {
      push @{$self->{Modified}}, $self->get_neighbour($sqt, $i);
   }
   $self->get_neighbour($sqt, $i)->{Updateflags} |= 1 << ($i ^ 4);
}

# Update the relevant portions of the boundary (a 64-by-64 square
# with the central 60-by-60 square removed) by copying data from
# the interiors (the 60-by-60 central squares) of the neighbours.
# Only perform this copying when necessary.
sub update_boundary {
   my $self = shift;
   my $sqt = shift;
   my $temp_modified = $self->{TempModified};
   if ( $sqt->{Updateflags} & (1 << $NEIGH_TOP) ) {
      my $n = $self->get_neighbour($sqt, $NEIGH_TOP);
      $sqt->{Row}->[0] = ($n->{Row}->[$TILE_SIZE_CORE] & $MIDDLE60) | 
+($sqt->{Row}->[0] & $OUTER4);
      $sqt->{Row}->[1] = ($n->{Row}->[$TILE_SIZE_CORE_P1] & $MIDDLE60)
+ | ($sqt->{Row}->[1] & $OUTER4);
   }
   if ( $sqt->{Updateflags} & (1 << $NEIGH_TOP_LEFT) ) {
      my $n = $self->get_neighbour($sqt, $NEIGH_TOP_LEFT);
      $sqt->{Row}->[0] = (($n->{Row}->[$TILE_SIZE_CORE] & $MIDDLE60) <
+< $TILE_SIZE_CORE) | ($sqt->{Row}->[0] & $RIGHT62);
      $sqt->{Row}->[1] = (($n->{Row}->[$TILE_SIZE_CORE_P1] & $MIDDLE60
+) << $TILE_SIZE_CORE) | ($sqt->{Row}->[1] & $RIGHT62);
   }
   if ( $sqt->{Updateflags} & (1 << $NEIGH_TOP_RIGHT) ) {
      my $n = $self->get_neighbour($sqt, $NEIGH_TOP_RIGHT);
      $sqt->{Row}->[0] = (($n->{Row}->[$TILE_SIZE_CORE] & $MIDDLE60) >
+> $TILE_SIZE_CORE) | ($sqt->{Row}->[0] & $LEFT62);
      $sqt->{Row}->[1] = (($n->{Row}->[$TILE_SIZE_CORE_P1] & $MIDDLE60
+) >> $TILE_SIZE_CORE) | ($sqt->{Row}->[1] & $LEFT62);
   }
   if ( $sqt->{Updateflags} & (1 << $NEIGH_BOTTOM) ) {
      my $n = $self->get_neighbour($sqt, $NEIGH_BOTTOM);
      $sqt->{Row}->[$TILE_SIZE_FULL_MB] = ($n->{Row}->[$BORDER_WIDTH] 
+& $MIDDLE60) | ($sqt->{Row}->[$TILE_SIZE_FULL_MB] & $OUTER4);
      $sqt->{Row}->[$TILE_SIZE_FULL_M1] = ($n->{Row}->[3] & $MIDDLE60)
+ | ($sqt->{Row}->[$TILE_SIZE_FULL_M1] & $OUTER4);
   }
   if ( $sqt->{Updateflags} & (1 << $NEIGH_BOTTOM_LEFT) ) {
      my $n = $self->get_neighbour($sqt, $NEIGH_BOTTOM_LEFT);
      $sqt->{Row}->[$TILE_SIZE_FULL_MB] = (($n->{Row}->[$BORDER_WIDTH]
+ & $MIDDLE60) << $TILE_SIZE_CORE) | ($sqt->{Row}->[$TILE_SIZE_FULL_MB
+] & $RIGHT62);
      $sqt->{Row}->[$TILE_SIZE_FULL_M1] = (($n->{Row}->[3] & $MIDDLE60
+) << $TILE_SIZE_CORE) | ($sqt->{Row}->[$TILE_SIZE_FULL_M1] & $RIGHT62
+);
   }
   if ( $sqt->{Updateflags} & (1 << $NEIGH_BOTTOM_RIGHT) ) {
      my $n = $self->get_neighbour($sqt, $NEIGH_BOTTOM_RIGHT);
      $sqt->{Row}->[$TILE_SIZE_FULL_MB] = (($n->{Row}->[$BORDER_WIDTH]
+ & $MIDDLE60) >> $TILE_SIZE_CORE) | ($sqt->{Row}->[$TILE_SIZE_FULL_MB
+] & $LEFT62);
      $sqt->{Row}->[$TILE_SIZE_FULL_M1] = (($n->{Row}->[3] & $MIDDLE60
+) >> $TILE_SIZE_CORE) | ($sqt->{Row}->[$TILE_SIZE_FULL_M1] & $LEFT62)
+;
   }
   if ( $sqt->{Updateflags} & (1 << $NEIGH_LEFT) ) {
      my $n = $self->get_neighbour($sqt, $NEIGH_LEFT);
      for my $i ($BORDER_WIDTH .. $TILE_SIZE_FULL_MB - 1) {
         $sqt->{Row}->[$i] = (($n->{Row}->[$i] & $MIDDLE60) << $TILE_S
+IZE_CORE) | ($sqt->{Row}->[$i] & $RIGHT62);
      }
   }
   if ( $sqt->{Updateflags} & (1 << $NEIGH_RIGHT) ) {
      my $n = $self->get_neighbour($sqt, $NEIGH_RIGHT);
      for my $i ($BORDER_WIDTH .. $TILE_SIZE_FULL_MB - 1) {
         $sqt->{Row}->[$i] = (($n->{Row}->[$i] & $MIDDLE60) >> $TILE_S
+IZE_CORE) | ($sqt->{Row}->[$i] & $LEFT62);
      }
   }
   $sqt->{Updateflags} = 0;
   push @{$temp_modified}, $sqt;
}

# Advance the interior of the tile by one generation.
sub update_tile {
   my $self = shift;
   my $modified = $self->{Modified};
   my $sqt  = shift;
   my ($update_flag, $neigh) = st64_tiletick($sqt->{Row});
   if ($update_flag) {
      if ($sqt->{Updateflags} == 0) { push @{$modified}, $sqt }
      $sqt->{Updateflags} |= 1 << $NUM_NEIGH;
   }
   for my $i (0 .. $NUM_NEIGH - 1) {
      if ($neigh & (1 << $i)) { $self->update_neighbour($sqt, $i) }
   }
}

sub tick {
   my $self = shift;
   my $modified = $self->{Modified};
   my $temp_modified = $self->{TempModified};
   while (@{$modified}) {
      $self->update_boundary(pop @{$modified});
   }
   while (@{$temp_modified}) {
      $self->update_tile(pop @{$temp_modified});
   }
}

sub updatecell {
   my $self = shift;
   my $sqt  = shift;
   my $x    = shift;
   my $y    = shift;

   if ($sqt->{Updateflags} == 0) { push @{$self->{Modified}}, $sqt }
   $sqt->{Updateflags} |= 1 << $NUM_NEIGH;
   if ($y <= $BORDER_WIDTH_P1) { $self->update_neighbour($sqt, $NEIGH_
+TOP) }
   if ($y >= $TILE_SIZE_CORE) { $self->update_neighbour($sqt, $NEIGH_B
+OTTOM) }
   if ($x <= $BORDER_WIDTH_P1) {
      $self->update_neighbour($sqt, $NEIGH_LEFT);
      if ($y <= $BORDER_WIDTH_P1) { $self->update_neighbour($sqt, $NEI
+GH_TOP_LEFT) }
      if ($y >= $TILE_SIZE_CORE) { $self->update_neighbour($sqt, $NEIG
+H_BOTTOM_LEFT) }
   }
   if ($x >= $TILE_SIZE_CORE) {
      $self->update_neighbour($sqt, $NEIGH_RIGHT);
      if ($y <= $BORDER_WIDTH_P1) { $self->update_neighbour($sqt, $NEI
+GH_TOP_RIGHT) }
      if ($y >= $TILE_SIZE_CORE) { $self->update_neighbour($sqt, $NEIG
+H_BOTTOM_RIGHT) }
   }
}

sub setcell {
   my $self  = shift;
   my $x     = shift;
   my $y     = shift;
   my $state = shift;
   my $tiles = $self->{Tiles};

   my $ox = $x % $TILE_SIZE_CORE;
   my $oy = $y % $TILE_SIZE_CORE;
   if ($ox < 0) { $ox += $TILE_SIZE_CORE }
   if ($oy < 0) { $oy += $TILE_SIZE_CORE }
   my $tx = ($x - $ox) / $TILE_SIZE_CORE;
   my $ty = ($y - $oy) / $TILE_SIZE_CORE;

   my $k = pack 'i2', $tx, $ty;
   unless (exists $tiles->{$k}) {
      $tiles->{$k} = {
         Row => [ (0) x 64 ],
         Tx => $tx,
         Ty => $ty,
         Updateflags => 0,
         Neighbours => [],
      };
   }
   my $xx = $ox + $BORDER_WIDTH;
   my $yy = $oy + $BORDER_WIDTH;
   st64_setcellval($tiles->{$k}->{Row}, $xx, $yy, $state);
   $self->updatecell($tiles->{$k}, $xx, $yy);
}

sub getcellval {
   my $self = shift;
   my $x    = shift;
   my $y    = shift;
   my $tiles = $self->{Tiles};

   my $ox = $x % $TILE_SIZE_CORE;
   my $oy = $y % $TILE_SIZE_CORE;
   if ($ox < 0) { $ox += $TILE_SIZE_CORE }
   if ($oy < 0) { $oy += $TILE_SIZE_CORE }
   my $tx = ($x - $ox) / $TILE_SIZE_CORE;
   my $ty = ($y - $oy) / $TILE_SIZE_CORE;
   my $k = pack 'i2', $tx, $ty;
   exists $tiles->{$k} or return 0;
   return st64_getcellval( $tiles->{$k}->{Row},
                           $ox + $BORDER_WIDTH,
                           $oy + $BORDER_WIDTH );
}

sub new {
   my $class = shift;
   my %init_self = ( Tiles => {}, Modified => [], TempModified => [] )
+;
   bless \%init_self, $class;
}

1;
[download]

Note that this new implementation passes all the same tests (tgol.t, tgol2.t, tgol3.t) described in High Performance Game of Life.

References

High Performance Game of Life

golly
apgnano (initial Life128 release from Adam P. Goucher (apg))
apgmera (later vlife release from apg)
lifelib (new lifelib release from apg)
lifelib (lifewiki: description of lifelib)
apgsearch (lifewiki: description of apgsearch)
apgcode (lifewiki: description of apgcode)
qlifealgo.h
LifeAPI
Lidka
Methuselah
Description of vlife and Life128 algorithms
Discussion of Life128 algorithm
Graphics Programming Black Book by Michael Abrash (discusses how to write fast code)

Bit array (wikipedia)
Hamming weight (wikipedia, aka popcount)
How to do popcount (aka Hamming weight) in Perl (popcount References)
Bitwise operators on binary objects
Bitwise operation (wikipedia)
Hacker's Delight book by Henry S. Warren Jr
Add two integers using bitwise operators (stack overflow)
Adder (electronics) (see full-adder and half-adder)
Logic gate (wikipedia)
Truth table (wikipedia)
Boolean function (wikipedia)
Four bit adder (Rosetta code)
How to simulate 4-bit binary adder in C? (Stack Overflow)
Full adder using C pre-processor

perlnumber
Integer Arithmetic (use integer)

Re^2: What's Perl good at or better than Python (Game of Life, LLiL, Rosetta and Performance References)

Updated: Added more references.

Comment on More Betterer Game of Life Download Code

Replies are listed 'Best First'.

Re: More Betterer Game of Life
by AppleFritter (Vicar) on Sep 20, 2017 at 10:22 UTC

That's really cool, excellent work. I'm almost surprised it became that much faster even in unoptimized form.

For optimum performance, standing on the shoulders of giants and creating an XS wrapper around lifelib would obviously be best, this would have the added advantage of supporting all (classes of) CAs that lifelib supports. That said, having a pure-Perl implementation that will run anywhere that Perl will is obviously valuable as well: lifelib is tied to a specific architecture and a handful of instruction sets. And to say you're beating the existing CPAN solutions would be an understatement.

(Speaking of CPAN: you do intend to eventually bundle this up and release it as a module, right?)

[reply]

Re^2: More Betterer Game of Life

by eyepopslikeamosquito (Archbishop) on Sep 21, 2017 at 21:21 UTC

For optimum performance, standing on the shoulders of giants and creating an XS wrapper around lifelib would obviously be best, this would have the added advantage of supporting all (classes of) CAs that lifelib supports. ... (Speaking of CPAN: you do intend to eventually bundle this up and release it as a module, right?)

[reply]

Re^3: More Betterer Game of Life

by AppleFritter (Vicar) on Sep 25, 2017 at 11:23 UTC

Thanks, I really like your idea of a XS wrapper around LifeLib.

Better yet, how about a Life module with different pluggable backends? Math::CellularAutomaton, say, using Math::CellularAutomaton::LifeLib, Math::CellularAutomaton::Organism or Math::CellularAutomaton::Organism_PP under the hood, depending on what's installed, what's supported by the current CPU architecture, and what is able to handle the CA the user wants to run.

[reply]

Re: More Betterer Game of Life
by eyepopslikeamosquito (Archbishop) on Sep 22, 2017 at 22:55 UTC

I've updated Organism.pm to automatically detect the integer size perl was built with and use 32 x 32 tiles for a 32-bit perl and 64 x 64 tiles for a 64-bit perl. You can override this by editing Organism.pm and manually setting $TILE_SIZE_FULL to 32 or 64. I've tested with an old 32-bit Perl 5.8.6 and it works for me.

Updated Benchmark Results

As you might expect, Organism.pm runs a bit slower (and uses more memory) with 32-bit ints -- but not by much.

Version 375K cells 750K cells 1.5 million cells 3 million cells

new Organism.pm (64 x 64 tiles) 1 secs 1 secs 3 secs 5 secs

new Organism.pm (32 x 32 tiles) 1 secs 1 secs 4 secs 7 secs

Organism.pm (Mario improvements) 13 secs 26 secs 52 secs 108 secs

Organism.pm (Original) 35 secs 70 secs 141 secs 284 secs

Game::Life::Infinite:Board 37 secs 96 secs 273 secs 905 secs

As for memory use, the maximum Windows Private Bytes used for the three million cell case by each process was:

New Organism.pm (64 x 64 tiles): 700,000K - 1,100,000K (update: seems to vary)
New Organism.pm (32 x 32 tiles): 1,400,000K
Organism.pm (Original): 1,455,004K
Organism.pm (Mario improvements): 1,596,368K
Game::Life::Infinite:Board: 18,138,504K

Benchmark timings running AppleFritter's Lidka test for 30,000 ticks were:

Version Lidka 30,000 ticks

new Organism.pm (64 x 64 tiles) 58 secs

new Organism.pm (32 x 32 tiles) 86 secs

Organism.pm (Mario improvements) 450 secs

Organism.pm (Original) 1635 secs

Game::Life::Infinite:Board 640 secs

Updated Organism.pm follows.