http://qs321.pair.com?node_id=667433

xaprb has asked for the wisdom of the Perl Monks concerning the following question:

I've written a program to dump MySQL data in parallel. Unfortunately it is behaving oddly. While two children are forked off running mysqldump shell commands, the parent just... hangs. It runs for a while, running waitpid() and forking off more children, but then sometimes it stops and does nothing anymore.

Without posting all the code, which is quite long, here's what I was able to isolate it to:

# Wait for the MySQL server to become responsive. my $tries = 0; while ( !$dbh->ping && $tries++ < $opts{w} ) { sleep(1); eval {

It hangs just after the ping and before the sleep(). I used DBI tracing to verify that it does call $dbh->ping. But it never gets past that.

So I commented this out. This is non-essential code: it's only there to handle the MySQL server going away and returning, in case there's a crash while you dump the data.

That moved the point of no return just a few lines down, to here:

# Start a new child process. while ( @work_to_do && $opts{m} > keys %kids ) { my $todo = shift @work_to_do;

Again, the hang is in the while() condition: it never reaches a 'print' placed just inside the loop. Meanwhile, top shows that the forked children go on to complete their work and exit, at which point they are defunct but never get reaped.

This is not the only forking program I've written. I'm not an expert at it, but I've never had anything like this happen before.

So my question is, do you have any general advice for me as I try to figure out the problem?

Replies are listed 'Best First'.
Re: fork() and defunct children
by kyle (Abbot) on Feb 12, 2008 at 02:31 UTC

    Do you open your $dbh before you fork? If so, the child may be meddling with it. Have a look at DBI, fork, and clone. for the gory details, but the nutshell version is that any connection open when the fork happens will be owned by two processes. This means that when the child exits, it will disconnect the parent's connection. It also means that if they're both trying to use it for access, they're going to get very confused.

    A simple way to handle this is not to have a connection open during the fork.

    The other way to handle it is to have the child set $dbh->{InactiveDestroy} = 1 and then undef the $dbh.

      Yes, and yes. But I use InactiveDestroy to prevent the DBH from going away because of the fork. I need to share the handle between the parent and children because the parent might take a global FLUSH TABLES WITH READ LOCK lock and hold it open. During that time the children need to be able to work. I may be able to work around this, though.

        I don't know how (or whether) DBD::mysql handles sharing handles across processes, but I suspect what you're trying to do just won't work. If the parent has a lock, the children will have to respect it.

Re: fork() and defunct children
by shmem (Chancellor) on Feb 11, 2008 at 20:45 UTC
    First of all,
    # Wait for the MySQL server to become responsive. my $tries = 0; while ( !$dbh->ping && $tries++ < $opts{w} ) { sleep(1);

    I'd print some debug messages before the sleep() call. If they don't show up, it means the ping never returns. In that case I'd set up a local $SIG{ALRM} = \&report_fail and drag the flow out of the $dbh->ping, printing as much informative message I can get from the current scope, DBI or the system all that runs on.

    If that fails to give me a clue, I'd go on with wiretapping (e.g. wireshark), then tracing/trussing the involved processes.

    --shmem

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: fork() and defunct children
by downer (Monk) on Feb 12, 2008 at 04:54 UTC
    As a general point, why are you trying to have multiple processes write to the DB? Do you think it will be faster? I suspect that since this will be most likely disk bound (aside from the cost of building an index, which can be done after all updates are performed), having multiple processes go at this won't gain you much. maybe its ok to get rid of the confusing and have a single process do it. Otherwise, you can try writing the parts that would be forked in another script, then doing a  system(" otherScript.pl $data &") then your main script will continue while the other script does it's thing.
      Not write to -- read from. It won't be faster, but the children will mostly be shelling out system commands, and occasionally doing something like SHOW CREATE TABLE to the database. So the DB accesses will be few and short, while the children mostly spend time waiting for system() calls to finish.