Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^3: Printing to STDERR causes deadlocks.

by bmann (Priest)
on Apr 26, 2005 at 21:41 UTC ( [id://451779]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Printing to STDERR causes deadlocks.
in thread Printing to STDERR causes deadlocks.

I agree that yield and sleep won't solve it reliably - actually, I indicated that in my previous node.

The problem is the space between signaling the main thread and setting eof and the space between testing whether eof is true and waiting on the shared variable. One of these actions needs to be atomic - you don't want to wait for more data if eof is true.

How about this - replace getDataT with this, it'll set $done then signal $sharedData is ready, removing the race condition:

sub getDataT { my ( $handle, $sharedDataRef, $doneRef ) = @_; my $temp; while( !$$doneRef ) { warn "t-Locking" . $/; lock $$sharedDataRef; warn 't-Waiting' . $/; cond_wait( $$sharedDataRef ) while $$sharedDataRef; warn 't-Setting' . $/; $$sharedDataRef = $handle->getline; # set $done before handing the data over to the main thread $$doneRef = 1 if $handle->eof; warn 't-Signalling' . $/; cond_signal( $$sharedDataRef ); } return; }
I would expect that to scale gracefully.

Have you looked at what Thread::Semaphore actually does?
Just the docs. Now I have read the source... point taken ;)

Replies are listed 'Best First'.
Re^4: Printing to STDERR causes deadlocks.
by BrowserUk (Patriarch) on Apr 26, 2005 at 22:12 UTC

    It still hangs when tracing is enabled, except now it hangs every time (20 attempts). I've posted the modified code below in case I screwed something up? (I assumed the my $temp; was an artifact?)

    I'd tried several variations on this theme also without success.

    I've also tried using locking directly on $done and $$doneRef, more out of desperation than logic, but it made no difference.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco.
    Rule 1 has a caveat! -- Who broke the cabal?
      Okay, I've run it multiple times. It hangs for me too, but randomly - about 1 out of 25 runs - more often if the machine is under heavy load. I moved the warn statements after the lock in both subs and moved the "warn m-processing" above the signal to reduce the time the variable was unlocked. That seemed to help, but how do we quantify it?

      The $done race was not the only one. Looking at the TRACE output, the warn statements don't get executed uniformly. I guess hat's to be expected, since the threads run asynchronously.

      However, when it hangs, I see one of two things: a missed signal or a signal being raised when the other thread isn't waiting.

      Now threads::shared says the following about the second condition:

      If there are no threads blocked in a "cond_wait" on the variable, the signal is discarded. By always locking before signaling, you can (with care), avoid signaling before another thread has entered cond_wait().
      Uh... what does it mean "with care"?

      Two more random notes -

      1. random lines (from the file to copy) get skipped when TRACE=1, again from missed signals.
      2. FWIW, 5.8.4 on Debian runs this correctly consistently.
      My conclusion is that there's too much happening between lock, wait and signal.

        This is the bit I do not understand about the api. In your example below:

        m-processing t-Locking t-Waiting t-Signalling # if a tree signals in a forest, and noone's listening t-Locking t-Waiting <<<< Point A m-locking m-waiting # and again, we both wait...

        At point A, t has locked the shared var and goes into the wait state.

        Then m gets a timeslice, finishes processing and loops back to lock() the var. It gets the lock, but t hasn't yet released it?

        Then again, whilst m is processing, the lock it acquired is still in force, so how the hell did t manage to aquire a lock and move forward to the wait/signal/lock/wait steps?

        If it is possible to use this api to synchronise two threads access to a single var, I'd sure like to see it.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco.
        Rule 1 has a caveat! -- Who broke the cabal?
      FWIW, this code works fine for me, with and without tracing, on perl 5.8.1 a 2 CPU 2GHz G5 mac with a 20MB text file (696k lines).

      Update: That reminds me. If your perl is ok, suspect pthreads. I've seen some majorly broken Linux versions (not sure what you're using).

        Thanks bluto. 5.8.1 had many other problems with threads. Maybe fixing those problems touched this, or maybe it is just the different implementations on mac/osx versus win. This stuff has been so little exercised that there is probably no way to tell.

        I no longer have 5.8.1, but I've tried various size files on 5.8.4, 5.8.5 and 5.8.6 without success:(

        I also tried using the two arg version of cond_wait(), but I'll admit to not understanding how that's meant to work at the Perl level anyway.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco.
        Rule 1 has a caveat! -- Who broke the cabal?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://451779]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-04-19 08:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found