Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Hi perlygapes,

Here are a few observations.

  • You don't ever want to share a DBI handle across processes or threads.

  • You don't need to. Databases support multiple connections; you just don't want too many open at the same time, and you want to avoid the overhead of opening the connection repeatedly.

  • You are using Parallel::ForkManager, so calling the child processes 'threads' seems to be misleading.

  • Your use of Parallel::ForkManager (a great tool I used and loved for years) is akin to using a slide rule when a programmable calculator is on your desk. It's better than the abacus up on the shelf, but ...

Using MCE for parallelization has two huge advantages (as well as all the others, haha):

  • It spawns a pool of workers that perform repeated jobs from the job list.

  • It uses chunking so a worker handles a bunch of jobs each time its turn comes up to grab a new chunk.

I would write your code something like the following working example.

Schema:

create table mytable( field1 integer, field2 varchar(24), field3 varchar(24), field4 varchar(24), field5 varchar(24) );

use strict; use warnings; use Data::GUID; use DBD::Pg; use SQL::Abstract; use Tie::Cycle; use MCE::Loop max_workers => 4; my $sqla = SQL::Abstract->new; my @cols = map {"field$_"} 1..5; my $ins_sql = $sqla->insert('mytable', { map { $_ => '' } @cols }); my $sel_sql = $sqla->select('mytable', 'count(*)', { field1 => '' }); #--------------------------------------------------------------------- +# mce_loop { my ($mce, $chunk, $chunk_id) = @_; my $dbh = get_dbh(); my $ins_sth = $dbh->prepare_cached($ins_sql); my $sel_sth = $dbh->prepare_cached($sel_sql); for my $record( @{$chunk} ) { $ins_sth->execute( @{$record} ); $sel_sth->execute( 42 ); my ($count) = $sel_sth->fetchrow_array; my $msg = sprintf 'wid %s; chnk %s; ins %s; cnt %s', $mce->wid, $chunk_id, $record->[0], $count; MCE->say($msg); } } @{ get_sample_data() }; #--------------------------------------------------------------------- +# sub get_dbh { my $dsn = 'DBI:Pg:dbname=test_db'; my $dbh = DBI->connect($dsn, $ENV{USER}, undef, { AutoCommit => 1, RaiseError => 1, PrintError => 1 }) or die "Connection failed!\n" . $DBI::errstr; return $dbh; } sub get_sample_data { tie my $value1, 'Tie::Cycle', [ 40 .. 49 ]; return [ map { [ $value1, map { Data::GUID->new->as_base64 } 0..3] } 1..1000 ]; } __END__

Partial output:

$ perl mce-pg.pl wid 4; chnk 3; ins 40; cnt 0 wid 1; chnk 1; ins 40; cnt 0 wid 3; chnk 2; ins 40; cnt 0 wid 2; chnk 4; ins 40; cnt 0 wid 4; chnk 3; ins 41; cnt 0 wid 1; chnk 1; ins 41; cnt 0 wid 3; chnk 2; ins 41; cnt 0 wid 2; chnk 4; ins 41; cnt 1 wid 4; chnk 3; ins 42; cnt 1 wid 1; chnk 1; ins 42; cnt 2 ... wid 2; chnk 97; ins 48; cnt 100 wid 4; chnk 99; ins 48; cnt 100 wid 1; chnk 100; ins 48; cnt 100 wid 3; chnk 98; ins 49; cnt 100 wid 2; chnk 97; ins 49; cnt 100 wid 4; chnk 99; ins 49; cnt 100 wid 1; chnk 100; ins 49; cnt 100

Notes:

  • MCE sets the chunk size automatically (in this case to 100)

  • We get 100 rows from the DB count query with field1 value 42, as expected.

  • As seen in the 8th and 9th line of output:
    wid 2; chnk 4; ins 41; cnt 1 wid 4; chnk 3; ins 42; cnt 1
    ... after worker 2 inserts its record with value 41 and when it makes the select query, worker 4 has already inserted a record with value 42, but has not yet made the select query or written its output message (comes right after).

  • I always use SQL::Abstract to generate SQL statements as I trust it more than I trust me.

Hope this helps!


The way forward always starts with a minimal test.

In reply to Re: Create parallel database handles... (MCE::Loop) by 1nickt
in thread Create parallel database handles or SQL statements for multi-threaded/process access to Postgres DB using DBI, DBD::Pg and Parallel::ForkManager by perlygapes

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others pondering the Monastery: (8)
    As of 2020-10-30 14:15 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      My favourite web site is:












      Results (280 votes). Check out past polls.

      Notices?