Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Perl threads test intermittently hangs on Windows

by eyepopslikeamosquito (Archbishop)
on Jun 03, 2004 at 03:29 UTC ( [id://359907]=perlquestion: print w/replies, xml ) Need Help??

eyepopslikeamosquito has asked for the wisdom of the Perl Monks concerning the following question:

Since this is my first ithreads program, I thought I better check that it is correct before perl-bugging it.

I am running the latest ActiveState Perl 5.8.3 build 809 on Windows XP. I would appreciate other monks checking that I'm not doing anything silly. If any monk has access to a Windows box, could you please run the test program below and let me know if it hangs for you? Thanks.

#!/usr/bin/perl -w # Intermittently hangs on win32. # The hangs appear to be a deadlock with both threads waiting # on a critical section. use strict; use threads; $|=1; sub do_one_thread { my $kid = shift; warn "kid=$kid $$ before require\n"; # Adding more requires seems to increase the chance of a hang. require File::Find; require CGI; warn "kid=$kid $$ after require\n"; } sub do_threads { my $nthreads = shift; my @kids = (); for my $i (1..$nthreads) { my $t = threads->new(\&do_one_thread, $i); warn "parent $$: continue\n"; push(@kids, $t); } for my $t (@kids) { warn "parent $$: waiting for join\n"; $t->join(); warn "parent $$: thread exited\n"; } } for my $i (1..1000) { print "i=$i:---\n"; do_threads(2); }

Replies are listed 'Best First'.
Re: Perl threads test intermittently hangs on Windows
by BrowserUk (Patriarch) on Jun 03, 2004 at 03:48 UTC

    I have identical setup--XP, AS809-- and your script runs to completion.

    There is a slight memory growth, from ~4MB to ~6MB, but given it creating/destroying 2000 threads each loading CGI which is no lightweight, that doesn't seem significant.

    ... i=999:--- parent 2104: continue parent 2104: continue parent 2104: waiting for join kid=1 2104 before require kid=1 2104 after require parent 2104: thread exited parent 2104: waiting for join kid=2 2104 before require kid=2 2104 after require parent 2104: thread exited i=1000:--- parent 2104: continue kid=1 2104 before require kid=1 2104 after require parent 2104: continue parent 2104: waiting for join parent 2104: thread exited parent 2104: waiting for join kid=2 2104 before require kid=2 2104 after require parent 2104: thread exited P:\test>perl -v This is perl, v5.8.3 built for MSWin32-x86-multi-thread (with 8 registered patches, see perl -V for more detail) Copyright 1987-2003, Larry Wall Binary build 809 provided by ActiveState Corp. http://www.ActiveState. +com ActiveState is a division of Sophos. Built Feb 3 2004 00:28:51

    I also added Date::Manip (about the heaviest module I have) to the list of requires. It slows it down considerably, and ups the memory usage to ~12MB - ~14 MB, but otherwise still completes without error.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
Re: Perl threads test intermittently hangs on Windows
by baruch (Beadle) on Jun 03, 2004 at 04:21 UTC

    I have the same version of Perl, running on XP (Home Edition). Your script doesn't hang for me.

    I don't see anything in your script that looks wrong. Could it be that your script is fine, and that this is an issue with Windows (difficult though that may be to believe)...?


    בּרוּך
      Could it be that your script is fine, and that this is an issue with Windows (difficult though that may be to believe)...?

      This bug seems highly sensitive to timing issues. If I put a little debug tracing in my Perl C sources, it still hangs, but if I put too much, the hang goes away!

      As for being a Windows bug, at first I dismissed that as being too bizarre. However, here is a debug dump from my 5.8.4 source code build at the time of one of the hangs. I understand how EnterCriticalSection can hang but I am at a loss to explain under what circumstances LeaveCriticalSection could hang. Maybe there is something wrong with my machine, I'll keep digging. Thanks.

      Microsoft (R) Windows Debugger Version 6.3.0005.1 Copyright (c) Microsoft Corporation. All rights reserved. *** wait with pending attach Symbol search path is: srv*C:\webmssymbols*http://msdl.microsoft.com/d +ownload/symbols Executable search path is: ModLoad: 00400000 00405000 C:\perl58\bin\perl.exe ModLoad: 77f50000 77ff7000 C:\WINDOWS\System32\ntdll.dll ModLoad: 77e60000 77f46000 C:\WINDOWS\system32\kernel32.dll ModLoad: 77c10000 77c63000 C:\WINDOWS\system32\MSVCRT.dll ModLoad: 28000000 280c0000 C:\perl58\bin\perl58.dll ModLoad: 77d40000 77dcc000 C:\WINDOWS\system32\USER32.dll ModLoad: 77c70000 77cb0000 C:\WINDOWS\system32\GDI32.dll ModLoad: 77dd0000 77e5d000 C:\WINDOWS\system32\ADVAPI32.dll ModLoad: 78000000 78086000 C:\WINDOWS\system32\RPCRT4.dll ModLoad: 10000000 10006000 C:\perl58\lib\auto\threads\threads.dll ModLoad: 00330000 00335000 C:\perl58\lib\auto\Cwd\Cwd.dll ModLoad: 00340000 00347000 C:\perl58\lib\auto\List\Util\Util.dll (f8c.1568): Break instruction exception - code 80000003 (first chance) eax=7ffdf000 ebx=00000001 ecx=00000002 edx=00000003 esi=00000004 edi=0 +0000005 eip=77f75a58 esp=0037ffcc ebp=0037fff4 iopl=0 nv up ei pl zr n +a po nc cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=0 +0000246 ntdll!DbgBreakPoint: 77f75a58 cc int 3 0:003> ~* kv 0 Id: f8c.17dc Suspend: 1 Teb: 7ffde000 Unfrozen ChildEBP RetAddr Args to Child 0140fc90 77f5c534 77e7a62d 00000be0 00000000 SharedUserData!SystemCall +Stub+0x4 (FPO: [0,0,0]) 0140fc94 77e7a62d 00000be0 00000000 00000000 ntdll!NtWaitForSingleObje +ct+0xc (FPO: [3,0,0]) 0140fcf8 77e7ac21 00000be0 ffffffff 00000000 kernel32!WaitForSingleObj +ectEx+0xa8 (FPO: [Non-Fpo]) *** WARNING: Unable to verify checksum for C:\perl58\lib\auto\threads\ +threads.dll 0140fd08 10001a0c 00000be0 ffffffff 00000004 kernel32!WaitForSingleObj +ect+0xf (FPO: [2,0,0]) 0140fd38 10002026 00224864 01941f7c 00224864 threads!Perl_ithread_join ++0x71 (CONV: cdecl) [threads.xs @ 572] *** WARNING: Unable to verify checksum for C:\perl58\bin\perl58.dll 0140fd5c 28040f91 00226ef0 01832e94 00000001 threads!XS_threads_join+0 +x7a (CONV: cdecl) [threads.xs @ 693] 0140fd9c 2805f803 01224864 00224864 28025080 perl58!Perl_pp_entersub+0 +x3ca (CONV: cdecl) [..\pp_hot.c @ 2857] 0140fda8 28025080 00224864 00224864 7ffdf000 perl58!Perl_runops_standa +rd+0xc (FPO: [1,0,1]) (CONV: cdecl) [..\run.c @ 23] 0140fdb8 28024f37 00224864 00000001 00000000 perl58!S_run_body+0xdd (F +PO: [2,0,2]) (CONV: cdecl) [..\perl.c @ 1921] 0140fe24 2808a52e 00224864 77f944a8 00000007 perl58!perl_run+0x17e (CO +NV: cdecl) [..\perl.c @ 1840] *** WARNING: Unable to verify checksum for C:\perl58\bin\perl.exe 0140ff3c 00401012 00000002 00222cc0 00223160 perl58!RunPerl+0x86 (CONV +: cdecl) [perllib.c @ 202] 0140ff4c 004010f9 00000002 00222cc0 00223160 perl!main+0x12 (FPO: [3,0 +,0]) (CONV: cdecl) [perlmain.c @ 17] 0140ffc0 77e814c7 77f944a8 00000007 7ffdf000 perl!mainCRTStartup+0xe3 0140fff0 00000000 00401016 00000000 00000000 kernel32!BaseProcessStart ++0x23 (FPO: [Non-Fpo]) 1 Id: f8c.f10 Suspend: 1 Teb: 7ffdd000 Unfrozen ChildEBP RetAddr Args to Child 0331fd38 77f5c534 77f69f68 000007ac 00000000 SharedUserData!SystemCall +Stub+0x4 (FPO: [0,0,0]) 0331fd3c 77f69f68 000007ac 00000000 00000000 ntdll!NtWaitForSingleObje +ct+0xc (FPO: [3,0,0]) 0331fdc4 77f5b2e6 00222d44 280877d9 00222d44 ntdll!RtlpWaitForCritical +Section+0x125 (FPO: [Non-Fpo]) 0331fdcc 280877d9 00222d44 28088542 0197308c ntdll!RtlEnterCriticalSec +tion+0x46 (FPO: [1,0,0]) 0331fdd4 28088542 0197308c 2808852f 01f9efac perl58!VMem::GetLock+0xa +(FPO: [0,0,0]) (CONV: thiscall) [vmem.h @ 259] 0331fddc 2808852f 01f9efac 28016df0 03bc4bb8 perl58!CPerlHost::FreeSha +red+0x12 (FPO: [1,0,1]) (CONV: thiscall) [perlhost.h @ 102] 0331fde4 28016df0 03bc4bb8 01f9efac 03bc4bb8 perl58!PerlMemSharedFree+ +0x10 (FPO: [2,0,0]) (CONV: cdecl) [perlhost.h @ 369] 0331fe08 28016b83 01d58014 0197308c 01d58014 perl58!S_cop_free+0x33 (F +PO: [2,0,3]) (CONV: cdecl) [..\op.c @ 529] 0331fe20 28016b55 01d58014 0197308c 01d58014 perl58!Perl_op_free+0xb8 +(FPO: [2,0,3]) (CONV: cdecl) [..\op.c @ 360] 0331fe38 28016b55 01d58014 0192f5a8 03d09bdc perl58!Perl_op_free+0x8a +(FPO: [2,0,3]) (CONV: cdecl) [..\op.c @ 345] 0331fe50 28060cb4 01d58014 01973048 03c0690c perl58!Perl_op_free+0x8a +(FPO: [2,0,3]) (CONV: cdecl) [..\op.c @ 345] 0331fe74 2805fa1e 01d58014 0000009a 2803b014 perl58!Perl_leave_scope+0 +x2c1 (CONV: cdecl) [..\scope.c @ 893] 0331fe80 2803b014 01d58014 00000000 01d58014 perl58!Perl_pop_scope+0x1 +c (FPO: [1,0,0]) (CONV: cdecl) [..\scope.c @ 137] 0331feb4 2805f803 06d58014 01d58014 28025770 perl58!Perl_pp_leaveeval+ +0x1b2 (CONV: cdecl) [..\pp_ctl.c @ 3485] 0331fec0 28025770 01d58014 01d58014 28025618 perl58!Perl_runops_standa +rd+0xc (FPO: [1,0,1]) (CONV: cdecl) [..\run.c @ 23] 0331fecc 28025618 01d58014 0331ff54 00000000 perl58!S_call_body+0x33 ( +FPO: [3,0,1]) (CONV: cdecl) [..\perl.c @ 2285] 0331ff88 100013e5 01d58014 01a41af4 00000004 perl58!Perl_call_sv+0x37b + (CONV: cdecl) [..\perl.c @ 2224] 0331ffb4 77e7d33b 03d09bdc 03b20000 002201b0 threads!Perl_ithread_run+ +0x106 (CONV: stdcall) [threads.xs @ 297] 0331ffec 00000000 100012df 01941f6c 00000000 kernel32!BaseThreadStart+ +0x37 (FPO: [Non-Fpo]) 2 Id: f8c.ae0 Suspend: 1 Teb: 7ffdc000 Unfrozen ChildEBP RetAddr Args to Child 04f1fdbc 77f5c204 77f6a08d 000007ac 03c807d0 SharedUserData!SystemCall +Stub+0x4 (FPO: [0,0,0]) 04f1fdc0 77f6a08d 000007ac 03c807d0 77f5b39d ntdll!NtSetEventBoostPrio +rity+0xc (FPO: [1,0,0]) 04f1fdcc 77f5b39d 00222d44 280877e4 00222d44 ntdll!RtlpUnWaitCriticalS +ection+0x1f (FPO: [1,0,1]) 04f1fdd4 280877e4 00222d44 28088554 01910aa0 ntdll!RtlLeaveCriticalSec +tion+0x1d (FPO: [1,0,0]) 04f1fddc 28088554 01910aa0 2808852f 03e5605c perl58!VMem::FreeLock+0xa + (FPO: [0,0,0]) (CONV: thiscall) [vmem.h @ 274] 04f1fde4 2808852f 03e5605c 28016de3 03c804d0 perl58!CPerlHost::FreeSha +red+0x24 (FPO: [1,0,1]) (CONV: thiscall) [perlhost.h @ 104] 04f1fdec 28016de3 03c804d0 03e5605c 00000000 perl58!PerlMemSharedFree+ +0x10 (FPO: [2,0,0]) (CONV: cdecl) [perlhost.h @ 369] 04f1fe08 28016b83 01ef7014 01910aa0 01ef7014 perl58!S_cop_free+0x26 (F +PO: [2,0,3]) (CONV: cdecl) [..\op.c @ 517] 04f1fe20 28016b55 01ef7014 01910aa0 01ef7014 perl58!Perl_op_free+0xb8 +(FPO: [2,0,3]) (CONV: cdecl) [..\op.c @ 360] 04f1fe38 28016b55 01ef7014 03cf7754 01bccac4 perl58!Perl_op_free+0x8a +(FPO: [2,0,3]) (CONV: cdecl) [..\op.c @ 345] 04f1fe50 28060cb4 01ef7014 0194c2cc 01add10c perl58!Perl_op_free+0x8a +(FPO: [2,0,3]) (CONV: cdecl) [..\op.c @ 345] 04f1fe74 2805fa1e 00000500 00000093 2803b014 perl58!Perl_leave_scope+0 +x2c1 (CONV: cdecl) [..\scope.c @ 893] 04f1fe80 2803b014 01ef7014 00000000 01ef7014 perl58!Perl_pop_scope+0x1 +c (FPO: [1,0,0]) (CONV: cdecl) [..\scope.c @ 137] 04f1feb4 2805f803 06ef7014 01ef7014 28025770 perl58!Perl_pp_leaveeval+ +0x1b2 (CONV: cdecl) [..\pp_ctl.c @ 3485] 04f1fec0 28025770 01ef7014 01ef7014 28025618 perl58!Perl_runops_standa +rd+0xc (FPO: [1,0,1]) (CONV: cdecl) [..\run.c @ 23] 04f1fecc 28025618 01ef7014 04f1ff54 00000000 perl58!S_call_body+0x33 ( +FPO: [3,0,1]) (CONV: cdecl) [..\perl.c @ 2285] 04f1ff88 100013e5 01ef7014 01aad984 00000004 perl58!Perl_call_sv+0x37b + (CONV: cdecl) [..\perl.c @ 2224] 04f1ffb4 77e7d33b 03e050b4 00000002 00220228 threads!Perl_ithread_run+ +0x106 (CONV: stdcall) [threads.xs @ 297] 04f1ffec 00000000 100012df 019fcdf4 00000000 kernel32!BaseThreadStart+ +0x37 (FPO: [Non-Fpo]) # 3 Id: f8c.1568 Suspend: 1 Teb: 7ffdb000 Unfrozen ChildEBP RetAddr Args to Child 0037ffc8 77f5f31f 00000005 00000004 00000001 ntdll!DbgBreakPoint (FPO: + [0,0,0]) 0037fff4 00000000 00000000 00000000 00000000 ntdll!DbgUiRemoteBreakin+ +0x2d (FPO: [Non-Fpo])

        If it helps any, I left your script running using

        for /L %i in (1,1,1000) do echo %i & start 359907 & 359907
        for little over 5 hours and in that time, it had run to completion 108 times with nary a peep in the event log.

        It's not clear to me what you debugger trace output is showing?

        It's a (long) while since I used MS kernel debugger, but I seem to remember that it automatically suspended threads except the main thread until you manually enabled them?

        It couldn't be that it's the debugger itself that is leading to the impression that something is hanging?


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
Re: Perl threads test intermittently hangs on Windows (problem solved!)
by eyepopslikeamosquito (Archbishop) on Jun 15, 2004 at 02:43 UTC
    After reproducing the hangs with a smallish C program, I have finally found the root cause of the hangs. I updated my D865GBF Intel hyper-threaded P4 motherboard BIOS from version BF86510A.86A.0046.P10 to the latest version BF86510A.86A.0061.P17 and the hangs disappeared. This is the only time I can remember that a bug in my program turned out to be a firmware bug! :-)
Re: Perl threads test intermittently hangs on Windows
by eyepopslikeamosquito (Archbishop) on Jun 04, 2004 at 08:07 UTC

    First, thanks to the monks who have tested this program for me! I'm still hoping to find someone who can reproduce my hangs; then we can play Sherlock Holmes to try to figure what is different about the hanging machines versus the non-hanging ones. Oh, and I get the hangs with debug builds too, which is good news.

    I'm learning far more about critical sections than I ever wanted to. When I issue the !cs command in WinDbg to display all critical sections, I get this on the offending critical section:

    DebugInfo = 0x00043bf0 Critical section = 0x00222d44 (+0x222D44) LOCKED LockCount = 0x0 OwningThread = 0x00000000 RecursionCount = 0x0 LockSemaphore = 0x7B0 SpinCount = 0x00000000

    As noted here having an OwningThread of zero is "illogical" with a LockCount of zero:

    While examining a deadlocked program, we also found a state for which there seemed to be no logical explanation. The LockCount field for one heavily used critical section contained a value greater than -1; that is, it was owned, but the OwningThread field was zero (which destroyed evidence against the guilty thread). The test program was multithreaded, and the condition arose on both single and multiprocessor machines. Although LockCount and other values differed from run to run, the program always deadlocked on the same critical section. We'd definitely be curious to find out if any other developers out there have seen the API call sequence leading up to this state.

    So, maybe I should contact Matt Pietrek to investigate this bug for me. :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://359907]
Approved by PodMaster
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-04-23 14:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found