comment on

Ah... I see about 4.5 busy cores when running serially. Meaning that Imager itself is involving multiple cores behind the scene. Well then, let's capture the compute time using Time::HiRes and increase from 9,999 to 99,999 iterations. Plus capture the compute time on a 32 core AMD 3970x processor with SMT disabled to better understand the benefit of chunking.

Here is the updated chunking demonstration to capture the compute time.

#!/usr/bin/perl

use strict;
use warnings;
use Imager;
use MCE::Loop;
use Time::HiRes 'time';

STDOUT->autoflush;

my $start = time;
my $count = 0;
my @data;

MCE::Loop->init(
  max_workers => MCE::Util::get_ncpu(),
  chunk_size  => 100,
  init_relay  => '',
  gather      => sub {
    if (@_ == 1) {
      print "\r", $count++;
    } else {
      push @data, @{ $_[1] };
    }
  }
);

mce_loop {
  my ($mce, $chunk_ref, $chunk_id) = @_;
  my @i_data;

  for my $x (@{ $chunk_ref }) {
    my $i = Imager->new(xsize=>120, ysize=>50)
    or die  Imager->errstr;

    $i->string(
      text  => $x,
      color => Imager::Color->new('ffffff'),
      font  => Imager::Font->new(
        file  => '/usr/share/fonts/truetype/msttcorefonts/cour.ttf',
      # file  => '/System/Library/Fonts/Courier.dfont',
      # face  => 'Courier New', # mswin
        size  => 42,
        aa    => 1),
      x     => 5,
      y     => 35
    );

    # One cannot serialize Imager objects or will crash.
    # Instead save the image to a scalar and send that.
    # The manager process later reads from scalar refs.

    $i->write(data => \my $data, type => 'gif');
    push @i_data, $data;

    MCE->gather($x);
  }

  MCE::relay { MCE->gather($chunk_id, \@i_data) };

} [ 0 .. 99_999 ];

MCE::Loop->finish;

print " frame GIF done!\n";
printf "compute time: %0.3fs\n", time - $start;

Imager->write_multi({
  file => 'gif.gif', type => 'gif', gif_loop => 0, gif_delay => 1
}, map { Imager->read_multi(data => \$_) } @data) or die Imager->errst
+r;

printf "Total: %0.3fs\n", time - $start;
[download]

I captured the compute time for 99,999 iterations. The total time includes writing the GIF file.

          Compute      Total

  Serial  28.685s  1m12.844s
Parallel   9.492s    22.588s
Chunking   3.772s    16.902s  SMT Disabled
Chunking   2.764s    15.982s  SMT Enabled
[download]

Relay is called orderly by chunk_id behind the scene. It involves workers waiting their turn to run inside the relay CODE block. Chunking is a way to reduce the IPC overhead whenever a single item takes little time to compute. Thereby seeing all 32 cores at 100% CPU utilization.

Running serially consumes 4.5 cores from what I can tell (i.e. Imager itself consumes more than 1 core). Chunking (compute time) is 7 times faster. That explains why not faster; 4.5 * 7 = 31.5 which is the number of cores the box I tested on.

Today, I learned that Imager or the lib C code runs parallel behind the scene.

Regards, Mario

In reply to Re^2: MCE segmentation fault by marioroy
in thread MCE segmentation fault by Anonymous Monk

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


No such thing as a small change
	PerlMonks