Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^4: Create parallel database handles... (MCE::Loop)

by perlygapes (Sexton)
on May 08, 2020 at 06:10 UTC ( [id://11116570]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Create parallel database handles... (MCE::Loop)
in thread Create parallel database handles or SQL statements for multi-threaded/process access to Postgres DB using DBI, DBD::Pg and Parallel::ForkManager

Something I just realised that I neglected to mention in my example was that I need to apply CPU affinity in the script. That is, I need to be able to specify that 'worker 1' MUST use CPU0, 'worker 2' MUST use CPU1, etc.

This is because I need to have another parallel code block where each worker launches an external single-threaded executable that will be accessing another DB and writing results to a third DB but these MUST NOT access and write to the same table at the same time. This affinity is in essence to avoid access conflicts/violations.

How can this be done in MCE?

Thanks again.

Replies are listed 'Best First'.
Re^5: Create parallel database handles... (MCE::Loop)
by 1nickt (Canon) on May 08, 2020 at 16:44 UTC

    Hi again,

    Now that's a classic XY problem statement! One usually gets better help by asking about how to achieve the goal, not how to implement the technique one has already decided is the way to achieve it ;-)

    I can think of no reason why one should ever have to concern oneself with which CPU core was used by a given worker. You should be able to write a program where you don't even have to concern yourself with workers.

    It sounds like from your problem description that you might need some kind of job queue. You can achieve this in many ways, but if you are already using MCE for parallelization, you can use MCE::Flow and MCE::Queue to handle enqueuing jobs based on the output of the first task handled by multiple workers. Look at the demo shown in the MCE::Flow doc.

    Hope this helps!


    The way forward always starts with a minimal test.
      I don't know why I missed your answer, but thanks again very much. Sorry it took me so long to respond.
      Yes, there is a very specific reason why I want to create CPU affinity: I am using this script to launch multiple instances of an old single threaded application and each instance is going to be working on the same overall dataset, but the dataset is a collection of files, and I do not want any file clobbering. I am specifically and purposefully trying to eliminate any chance of one of these processes interfering (even just reading) the same file another process is currently working on.

      Sorry if I seem to you to be asking basic questions - I am a plumber by trade...teaching oneself to code is very difficult.

      I tried amending your example slightly to this - and it does not give the result I expect with regard to the $process values:

      #!/usr/bin/perl use strict; use warnings; use 5.010; use Data::Dumper; use HTTP::Tiny; use Time::HiRes 'gettimeofday', 'tv_interval'; use MCE; use MCE::Shared; my $ua = HTTP::Tiny->new( timeout => 10 ); my @urls = qw< gap.com amazon.com ebay.com lego.com wunderground.com imdb.com underarmour.com disney.com espn.com dailymail.com >; my $report = MCE::Shared->hash; my $process = MCE::Shared->scalar; $process = 0; MCE->new( max_workers => 6 )->foreach( \@urls, sub { my $start = [gettimeofday]; $process++; say $process."->GETting https://".$_; $ua->get('https://' . $_); $report->set( $_, tv_interval($start, [gettimeofday]) ); }); say Dumper $report->export;

      The output reads:
      1->GETting https://gap.com 1->GETting https://amazon.com 1->GETting https://ebay.com 1->GETting https://lego.com 1->GETting https://wunderground.com 1->GETting https://imdb.com 2->GETting https://underarmour.com 2->GETting https://disney.com 2->GETting https://espn.com 3->GETting https://dailymail.com $VAR1 = bless( { 'disney.com' => '1.15682', 'amazon.com' => '4.607657', 'wunderground.com' => '0.46855', 'dailymail.com' => '2.355818', 'espn.com' => '1.170818', 'gap.com' => '3.819699', 'ebay.com' => '1.479624', 'underarmour.com' => '2.919818', 'imdb.com' => '2.540127', 'lego.com' => '0.919592' }, 'MCE::Shared::Hash' );
      whereas I had expected:
      1->GETting https://gap.com 2->GETting https://amazon.com 3->GETting https://ebay.com 4->GETting https://lego.com 5->GETting https://wunderground.com 6->GETting https://imdb.com 7->GETting https://underarmour.com 8->GETting https://disney.com 9->GETting https://espn.com 10->GETting https://dailymail.com $VAR1 = bless( { 'disney.com' => '1.15682', 'amazon.com' => '4.607657', 'wunderground.com' => '0.46855', 'dailymail.com' => '2.355818', 'espn.com' => '1.170818', 'gap.com' => '3.819699', 'ebay.com' => '1.479624', 'underarmour.com' => '2.919818', 'imdb.com' => '2.540127', 'lego.com' => '0.919592' }, 'MCE::Shared::Hash' );

      Can you explain this?

      Thanks.

        Greetings, perlygapes

        The shared variable is constructed using OO, not via the TIE interface. Therefore, assigning to 0 overwrites the variable. Incrementing is possible via $process->incr. Another way is MCE->chunk_id (2nd example). MCE::Mutex is helpful for one worker to access a resource, blocking others (3rd example).

        $process->incr()

        #!/usr/bin/perl use strict; use warnings; use 5.010; use Data::Dumper; use HTTP::Tiny; use Time::HiRes 'gettimeofday', 'tv_interval'; use MCE; use MCE::Shared; my $ua = HTTP::Tiny->new( timeout => 10 ); my @urls = qw< gap.com amazon.com ebay.com lego.com wunderground.com imdb.com underarmour.com disney.com espn.com dailymail.com >; my $report = MCE::Shared->hash; my $process = MCE::Shared->scalar(0); MCE->new( max_workers => 6 )->foreach( \@urls, sub { my $start = [gettimeofday]; say $process->incr()."->GETting https://".$_; $ua->get('https://' . $_); $report->set( $_, tv_interval($start, [gettimeofday]) ); }); say Dumper $report->export;

        MCE->chunk_id()

        #!/usr/bin/perl use strict; use warnings; use 5.010; use Data::Dumper; use HTTP::Tiny; use Time::HiRes 'gettimeofday', 'tv_interval'; use MCE; use MCE::Shared; my $ua = HTTP::Tiny->new( timeout => 10 ); my @urls = qw< gap.com amazon.com ebay.com lego.com wunderground.com imdb.com underarmour.com disney.com espn.com dailymail.com >; my $report = MCE::Shared->hash; MCE->new( max_workers => 6 )->foreach( \@urls, sub { my $start = [gettimeofday]; say MCE->chunk_id()."->GETting https://".$_; $ua->get('https://' . $_); $report->set( $_, tv_interval($start, [gettimeofday]) ); }); say Dumper $report->export;

        Assessing 3rd DB -- one worker

        use strict; use warnings; use 5.010; use Data::Dumper; use HTTP::Tiny; use Time::HiRes 'gettimeofday', 'tv_interval'; use MCE; use MCE::Shared; use MCE::Mutex; my $ua = HTTP::Tiny->new( timeout => 10 ); my @urls = qw< gap.com amazon.com ebay.com lego.com wunderground.com imdb.com underarmour.com disney.com espn.com dailymail.com >; my $report = MCE::Shared->hash; my $mutex = MCE::Mutex->new(); MCE->new( max_workers => 6 )->foreach( \@urls, sub { my $start = [gettimeofday]; say MCE->chunk_id()."->GETting https://".$_; $ua->get('https://' . $_); $report->set( $_, tv_interval($start, [gettimeofday]) ); # access 3rd DB, one worker $mutex->lock; # update ... $mutex->unlock; # ditto $mutex->enter( sub { # update ... }); }); say Dumper $report->export;

        Regards, Mario

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11116570]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-03-29 13:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found