Ah... I see about 4.5 busy cores when running serially. Meaning that Imager itself is involving multiple cores behind the scene. Well then, let's capture the compute time using Time::HiRes and increase from 9,999 to 99,999 iterations. Plus capture the compute time on a 32 core AMD 3970x processor with SMT disabled to better understand the benefit of chunking.
Here is the updated chunking demonstration to capture the compute time.
#!/usr/bin/perl
use strict;
use warnings;
use Imager;
use MCE::Loop;
use Time::HiRes 'time';
STDOUT->autoflush;
my $start = time;
my $count = 0;
my @data;
MCE::Loop->init(
max_workers => MCE::Util::get_ncpu(),
chunk_size => 100,
init_relay => '',
gather => sub {
if (@_ == 1) {
print "\r", $count++;
} else {
push @data, @{ $_[1] };
}
}
);
mce_loop {
my ($mce, $chunk_ref, $chunk_id) = @_;
my @i_data;
for my $x (@{ $chunk_ref }) {
my $i = Imager->new(xsize=>120, ysize=>50)
or die Imager->errstr;
$i->string(
text => $x,
color => Imager::Color->new('ffffff'),
font => Imager::Font->new(
file => '/usr/share/fonts/truetype/msttcorefonts/cour.ttf',
# file => '/System/Library/Fonts/Courier.dfont',
# face => 'Courier New', # mswin
size => 42,
aa => 1),
x => 5,
y => 35
);
# One cannot serialize Imager objects or will crash.
# Instead save the image to a scalar and send that.
# The manager process later reads from scalar refs.
$i->write(data => \my $data, type => 'gif');
push @i_data, $data;
MCE->gather($x);
}
MCE::relay { MCE->gather($chunk_id, \@i_data) };
} [ 0 .. 99_999 ];
MCE::Loop->finish;
print " frame GIF done!\n";
printf "compute time: %0.3fs\n", time - $start;
Imager->write_multi({
file => 'gif.gif', type => 'gif', gif_loop => 0, gif_delay => 1
}, map { Imager->read_multi(data => \$_) } @data) or die Imager->errst
+r;
printf "Total: %0.3fs\n", time - $start;
I captured the compute time for 99,999 iterations. The total time includes writing the GIF file.
Compute Total
Serial 28.685s 1m12.844s
Parallel 9.492s 22.588s
Chunking 3.772s 16.902s SMT Disabled
Chunking 2.764s 15.982s SMT Enabled
Relay is called orderly by chunk_id behind the scene. It involves workers waiting their turn to run inside the relay CODE block. Chunking is a way to reduce the IPC overhead whenever a single item takes little time to compute. Thereby seeing all 32 cores at 100% CPU utilization.
Running serially consumes 4.5 cores from what I can tell (i.e. Imager itself consumes more than 1 core). Chunking (compute time) is 7 times faster. That explains why not faster; 4.5 * 7 = 31.5 which is the number of cores the box I tested on.
Today, I learned that Imager or the lib C code runs parallel behind the scene.
Regards, Mario
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.