http://qs321.pair.com?node_id=11115493


in reply to Re: Create parallel database handles... (MCE::Loop)
in thread Create parallel database handles or SQL statements for multi-threaded/process access to Postgres DB using DBI, DBD::Pg and Parallel::ForkManager

Thank you 1nickt for raising my attention to the difference between DB connections and DB handles. That is a very good point.

I readily admit that:

  • I don't understand the exact nature of how Perl uses a DBH and how Postgres sees it
  • using a separate DB connection instead for each child feels intuitively right
  • my understanding and use of the terms 'thread' and 'process' are not fully mature and somewhat imprecise
  • I have never used SQL::Abstract but upon having a cursory glance after your example it looks like a good option that may save me some code
  • I've also never used MCE - this is not as easy to understand and I'll have to look into it
  • Replies are listed 'Best First'.
    Re^3: Create parallel database handles... (MCE::Loop)
    by 1nickt (Canon) on Apr 14, 2020 at 03:13 UTC

      Hi again perlygapes,

      The MCE::Loop code just abstracts away all your Parallel::ForkManager logic and improves it, just as Parallel::ForkManager abstracts away and improves some of the tedious manual work of using fork() directly. See how the logic is encapsulated in a sub just like in your code, only with less concurrency boilerplate.

      "using a separate DB connection instead for each child feels intuitively right"

      I agree, the code I shared keeps a connection open for each child, which itself stays alive and handles multiple jobs from the job list as managed by MCE.

      Here's a simpler example I've shared recently showing how to parallelize existing code for making a series of HTTP requests. How would you do the same using P::FM?

      single process

      use strict; use warnings; use 5.010; use Data::Dumper; use HTTP::Tiny; use Time::HiRes 'gettimeofday', 'tv_interval'; my $ua = HTTP::Tiny->new( timeout => 10 ); my @urls = qw< gap.com amazon.com ebay.com lego.com wunderground.com imdb.com underarmour.com disney.com espn.com dailymail.com >; my %report; foreach( @urls) { my $start = [gettimeofday]; $ua->get('https://' . $_); $report{$_} = tv_interval($start, [gettimeofday]) ); }); say Dumper \%report;

      six processes
      (workers stay alive, looping through the list, writing to a shared hash)
      (one added line, two slightly changed lines)

      use strict; use warnings; use 5.010; use Data::Dumper; use HTTP::Tiny; use Time::HiRes 'gettimeofday', 'tv_interval'; use MCE; use MCE::Shared; my $ua = HTTP::Tiny->new( timeout => 10 ); my @urls = qw< gap.com amazon.com ebay.com lego.com wunderground.com imdb.com underarmour.com disney.com espn.com dailymail.com >; my $report = MCE::Shared->hash; MCE->new( max_workers => 6 )->foreach( \@urls, sub { my $start = [gettimeofday]; $ua->get('https://' . $_); $report->set( $_, tv_interval($start, [gettimeofday]) ); }); say Dumper $report->export;

      Update: fixed error in first demo code, ++choroba

      Hope this helps!



      The way forward always starts with a minimal test.
        Something I just realised that I neglected to mention in my example was that I need to apply CPU affinity in the script. That is, I need to be able to specify that 'worker 1' MUST use CPU0, 'worker 2' MUST use CPU1, etc.

        This is because I need to have another parallel code block where each worker launches an external single-threaded executable that will be accessing another DB and writing results to a third DB but these MUST NOT access and write to the same table at the same time. This affinity is in essence to avoid access conflicts/violations.

        How can this be done in MCE?

        Thanks again.

          Hi again,

          Now that's a classic XY problem statement! One usually gets better help by asking about how to achieve the goal, not how to implement the technique one has already decided is the way to achieve it ;-)

          I can think of no reason why one should ever have to concern oneself with which CPU core was used by a given worker. You should be able to write a program where you don't even have to concern yourself with workers.

          It sounds like from your problem description that you might need some kind of job queue. You can achieve this in many ways, but if you are already using MCE for parallelization, you can use MCE::Flow and MCE::Queue to handle enqueuing jobs based on the output of the first task handled by multiple workers. Look at the demo shown in the MCE::Flow doc.

          Hope this helps!


          The way forward always starts with a minimal test.