Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Why isn't this code thread-safe? (MCE!)

by BrowserUk (Patriarch)
on Nov 10, 2018 at 08:09 UTC ( [id://1225516]=note: print w/replies, xml ) Need Help??


in reply to Why isn't this code thread-safe? (Is "require" thread-safe??)

The problem is almost certainly that CAM::PDF or one of its many dependencies isn't thread safe. I vaguely recall finding that one of the GZIP modules used some global state internally at the C/XS level. There are ways around it (using threads), but they require knowledge and testing I'm no longer in a position to supply, but in any case there is a better way for this type of application: marioroy's MCE.

He has posted dozens of well written and tested examples of exactly this type of IO multi-tasking problem, and has proven himself willing and able to actively support those using his code.

And finally, even my best attempts at IO multitasking using threads never came close to achieving the same level of throughput and performance that he achieved with his early versions; and his latest stuff is even more efficient.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit
  • Comment on Re: Why isn't this code thread-safe? (MCE!)

Replies are listed 'Best First'.
Re^2: Why isn't this code thread-safe? (MCE!)
by vr (Curate) on Nov 10, 2018 at 11:41 UTC
    I vaguely recall finding that one of the GZIP modules used some global state

    Thanks for answer, I've reduced the problem to Compress::Zlib:

    use strict; use warnings; use feature 'say'; use threads; use Thread::Queue; my $q = Thread::Queue-> new; my @gang = map async( sub { while ( defined( my $f = $q-> dequeue )) { require Compress::Zlib; say threads-> tid; } }), 1 .. 4; select( undef, undef, undef, 0.1 ) or $q-> enqueue( $_ ) for 1 .. 4; $q-> end; $_-> join for @gang;

    With small (or none) delays between threads "requiring" Compress::Zlib, errors are happenning:

    D:\>perl test181110.pl 2 3 1 4 D:\>perl test181110.pl 2 4 3 1 D:\>perl test181110.pl 3 2 1 4 D:\>perl test181110.pl String found where operator expected at C:/strawberry-perl-5.28.0.1-32 +bit-PDL/pe rl/lib/IO/Compress/Base/Common.pm line 514, near "croak "$sub: $p->[Ix +Error]"" (Do you need to predeclare croak?) Thread 3 terminated abnormally: syntax error at C:/strawberry-perl-5.2 +8.0.1-32bi t-PDL/perl/lib/IO/Compress/Base/Common.pm line 514, near "croak "$sub: + $p->[IxEr ror]"" BEGIN not safe after errors--compilation aborted at C:/strawberry-perl +-5.28.0.1- 32bit-PDL/perl/lib/IO/Compress/Base/Common.pm line 520. Compilation failed in require at C:/strawberry-perl-5.28.0.1-32bit-PDL +/perl/lib/ Compress/Zlib.pm line 10. BEGIN failed--compilation aborted at C:/strawberry-perl-5.28.0.1-32bit +-PDL/perl/ lib/Compress/Zlib.pm line 10. Compilation failed in require at test181110.pl line 11. 1 4 2 D:\>perl test181110.pl Thread 3 terminated abnormally: Bareword "HIGH" not allowed while "str +ict subs" in use at C:/strawberry-perl-5.28.0.1-32bit-PDL/perl/lib/IO/Compress/B +ase/Common .pm line 868. BEGIN not safe after errors--compilation aborted at C:/strawberry-perl +-5.28.0.1- 32bit-PDL/perl/lib/IO/Compress/Base/Common.pm line 1038. Compilation failed in require at C:/strawberry-perl-5.28.0.1-32bit-PDL +/perl/lib/ Compress/Zlib.pm line 10. BEGIN failed--compilation aborted at C:/strawberry-perl-5.28.0.1-32bit +-PDL/perl/ lib/Compress/Zlib.pm line 10. Compilation failed in require at test181110.pl line 11. 1 2 4 D:\>perl test181110.pl 1 3 1 2 D:\>perl test181110.pl 1 3 2 4

    But if delay is increased to 0.3 script is run hundreds of times, with batch command, w/o errors. Instead of delays (value relevant to one PC only), module should be "used" in main thread, of course.

    I don't understand why text of errors indicates incomplete or "noisy" reading (parsing) of Perl source files, as if random lines are just omitted. And in OP, same "broken source" happens to rather simple Text::PDF::Filter. As I see, there's relatively complex chain of require's in Text::PDF::Filter, to Compress::Zlib and further to parts of that distribution. Maybe there is same "long chain of requires" in some other distributions, to check with script above?

      I think moving to requireing the module tree in the thread subs is probably compounding the problems.

      Imagine one thread gets a timeslice, gets part way through loading the module tree and then gets interrupted and another thread starts its attempt to load those same modules. If one of them has global state that is only used during loading, that interruption may leave it in an undefined state and the second thread inherits that state at the C level.

      If you really want to go that route, you should consider wrapping the require in a critical section to ensure that no two threads can be attempting to load the module tree concurrently.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". The enemy of (IT) success is complexity.
      In the absence of evidence, opinion is indistinguishable from prejudice. Suck that fhit

      Hello vr,

      One can try loading IO::Handle before spawning workers. That alone is helpful for increasing reliability for modules that involve IO::*. I was able to reproduce threads failing but not after adding the IO::Handle line. Even tested with 100 threads without a delay between them.

      use warnings; use feature 'say'; use threads; use Thread::Queue; use IO::Handle; # <-- important my $q = Thread::Queue-> new; my @gang = map async( sub { while ( defined( my $f = $q-> dequeue )) { require Compress::Zlib; say threads-> tid; } }), 1 .. 8; $q-> enqueue( $_ ) for 1 .. 8; $q-> end; $_-> join for @gang;

      Kind regards, Mario

        Hi again,

        Here is the same thing using MCE::Hobo. Similar code, but processes instead.

        use warnings; use feature 'say'; use MCE::Hobo; use MCE::Shared; # use IO::Handle; # <-- loaded automatically by MCE and MCE::Shared:: +Server my $q = MCE::Shared-> queue; my @gang = map mce_async( sub { while ( defined( my $f = $q-> dequeue )) { require Compress::Zlib; say MCE::Hobo-> tid; } }), 1 .. 100; $q-> enqueue( $_ ) for 1 .. 100; $q-> end; $_-> join for @gang;

        For modules not multi-process safe, another thing one can do on Unix platforms is having MCE::Hobo default to posix_exit to avoid all END and destructor processing.

        use warnings; use feature 'say'; use MCE::Hobo; use MCE::Shared; # use IO::Handle; # <-- loaded automatically by MCE before spawning MCE::Hobo->init( posix_exit => 1 ); my $q = MCE::Shared-> queue; my @gang = map mce_async( sub { while ( defined( my $f = $q-> dequeue )) { require Compress::Zlib; say MCE::Hobo-> tid; } }), 1 .. 8; $q-> enqueue( $_ ) for 1 .. 8; $q-> end; $_-> join for @gang;

        Note that the posix_exit option is not recommended if constructing an object inside the worker involving a temp file. In that case one may want the worker to exit normally. Anyway, the posix_exit option is there if needed as a last resort.

        Taken from the MCE::Hobo manual: Set posix_exit to avoid all END and destructor processing. Constructing MCE::Hobo inside a thread implies 1 or if present CGI, FCGI, Coro, Curses, Gearman::Util, Gearman::XS, LWP::UserAgent, Mojo::IOLoop, Prima, STFL, Tk, Wx, or Win32::GUI.

        A lot of modules are not multi-process safe and the reason for setting to 1 automatically. Btw, Prima is now multi-process safe recently.

        Kind regards, Mario

        So, the problem is not about Compress::Zlib (or poor Text::PDF::Filter... did we torture it beyond convalescing? oops, sorry). The minimal script is then:

        use strict; use warnings; use feature 'say'; use threads; use Thread::Queue; my $q = Thread::Queue-> new; my @gang = map async( sub { while ( defined( my $f = $q-> dequeue )) { require IO::Handle; say threads-> tid; } }), 1 .. 8; $q-> enqueue( $_ ) for 1 .. @gang; $q-> end; $_-> join for @gang;

        which sometimes (often, both Windows and Linux) fails randomly. Further, replacing the require line with

        use XSLoader; XSLoader::load 'IO';

        sometimes there are strange warnings, like:

        Constant subroutine SEEK_END redefined at threads.pl line 4294967295

        Maybe this line number ("broken source"?) is important, maybe not. Until (if ever) the issue is fixed, maybe it would be good idea to advise (documentation, i.e.) to use IO:Handle in any threaded code? Won't hurt...

        As to issue of "multi-process safe modules", I can't get same problem if above script is simply re-written using fork (Linux), perhaps it's unrelated issue?

        And, certainly, thanks for showing that even a throw-away, few dozen lines script is as easily written using MCE, as "vanilla Perl", except it "just works", because author has put his knowledge and effort to take care of possible bugs. Thanks, Mario :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1225516]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-25 14:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found