http://qs321.pair.com?node_id=11113686


in reply to Re: MCE segmentation fault
in thread MCE segmentation fault

Ah... I see about 4.5 busy cores when running serially. Meaning that Imager itself is involving multiple cores behind the scene. Well then, let's capture the compute time using Time::HiRes and increase from 9,999 to 99,999 iterations. Plus capture the compute time on a 32 core AMD 3970x processor with SMT disabled to better understand the benefit of chunking.

Here is the updated chunking demonstration to capture the compute time.

#!/usr/bin/perl use strict; use warnings; use Imager; use MCE::Loop; use Time::HiRes 'time'; STDOUT->autoflush; my $start = time; my $count = 0; my @data; MCE::Loop->init( max_workers => MCE::Util::get_ncpu(), chunk_size => 100, init_relay => '', gather => sub { if (@_ == 1) { print "\r", $count++; } else { push @data, @{ $_[1] }; } } ); mce_loop { my ($mce, $chunk_ref, $chunk_id) = @_; my @i_data; for my $x (@{ $chunk_ref }) { my $i = Imager->new(xsize=>120, ysize=>50) or die Imager->errstr; $i->string( text => $x, color => Imager::Color->new('ffffff'), font => Imager::Font->new( file => '/usr/share/fonts/truetype/msttcorefonts/cour.ttf', # file => '/System/Library/Fonts/Courier.dfont', # face => 'Courier New', # mswin size => 42, aa => 1), x => 5, y => 35 ); # One cannot serialize Imager objects or will crash. # Instead save the image to a scalar and send that. # The manager process later reads from scalar refs. $i->write(data => \my $data, type => 'gif'); push @i_data, $data; MCE->gather($x); } MCE::relay { MCE->gather($chunk_id, \@i_data) }; } [ 0 .. 99_999 ]; MCE::Loop->finish; print " frame GIF done!\n"; printf "compute time: %0.3fs\n", time - $start; Imager->write_multi({ file => 'gif.gif', type => 'gif', gif_loop => 0, gif_delay => 1 }, map { Imager->read_multi(data => \$_) } @data) or die Imager->errst +r; printf "Total: %0.3fs\n", time - $start;

I captured the compute time for 99,999 iterations. The total time includes writing the GIF file.

Compute Total Serial 28.685s 1m12.844s Parallel 9.492s 22.588s Chunking 3.772s 16.902s SMT Disabled Chunking 2.764s 15.982s SMT Enabled

Relay is called orderly by chunk_id behind the scene. It involves workers waiting their turn to run inside the relay CODE block. Chunking is a way to reduce the IPC overhead whenever a single item takes little time to compute. Thereby seeing all 32 cores at 100% CPU utilization.

Running serially consumes 4.5 cores from what I can tell (i.e. Imager itself consumes more than 1 core). Chunking (compute time) is 7 times faster. That explains why not faster; 4.5 * 7 = 31.5 which is the number of cores the box I tested on.

Today, I learned that Imager or the lib C code runs parallel behind the scene.

Regards, Mario