I have identical setup--XP, AS809-- and your script runs to completion.
There is a slight memory growth, from ~4MB to ~6MB, but given it creating/destroying 2000 threads each loading CGI which is no lightweight, that doesn't seem significant.
...
i=999:---
parent 2104: continue
parent 2104: continue
parent 2104: waiting for join
kid=1 2104 before require
kid=1 2104 after require
parent 2104: thread exited
parent 2104: waiting for join
kid=2 2104 before require
kid=2 2104 after require
parent 2104: thread exited
i=1000:---
parent 2104: continue
kid=1 2104 before require
kid=1 2104 after require
parent 2104: continue
parent 2104: waiting for join
parent 2104: thread exited
parent 2104: waiting for join
kid=2 2104 before require
kid=2 2104 after require
parent 2104: thread exited
P:\test>perl -v
This is perl, v5.8.3 built for MSWin32-x86-multi-thread
(with 8 registered patches, see perl -V for more detail)
Copyright 1987-2003, Larry Wall
Binary build 809 provided by ActiveState Corp. http://www.ActiveState.
+com
ActiveState is a division of Sophos.
Built Feb 3 2004 00:28:51
I also added Date::Manip (about the heaviest module I have) to the list of requires. It slows it down considerably, and ups the memory usage to ~12MB - ~14 MB, but otherwise still completes without error.
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
| [reply] [d/l] |
I have the same version of Perl, running on XP (Home Edition). Your script doesn't hang for me.
I don't see anything in your script that looks wrong. Could it be that your script is fine, and that this is an issue with Windows (difficult though that may be to believe)...?
| [reply] |
Could it be that your script is fine, and that this is an issue with Windows (difficult though that may be to believe)...?
This bug seems highly sensitive to timing issues. If I put a little debug tracing in my Perl C sources, it still hangs, but if I put too much, the hang goes away!
As for being a Windows bug, at first I dismissed that as being too bizarre. However, here is a debug dump from my 5.8.4 source code build at the time of one of the hangs. I understand how EnterCriticalSection can hang but I am at a loss to explain under what circumstances LeaveCriticalSection could hang. Maybe there is something wrong with my machine, I'll keep digging. Thanks.
Microsoft (R) Windows Debugger Version 6.3.0005.1
Copyright (c) Microsoft Corporation. All rights reserved.
*** wait with pending attach
Symbol search path is: srv*C:\webmssymbols*http://msdl.microsoft.com/d
+ownload/symbols
Executable search path is:
ModLoad: 00400000 00405000 C:\perl58\bin\perl.exe
ModLoad: 77f50000 77ff7000 C:\WINDOWS\System32\ntdll.dll
ModLoad: 77e60000 77f46000 C:\WINDOWS\system32\kernel32.dll
ModLoad: 77c10000 77c63000 C:\WINDOWS\system32\MSVCRT.dll
ModLoad: 28000000 280c0000 C:\perl58\bin\perl58.dll
ModLoad: 77d40000 77dcc000 C:\WINDOWS\system32\USER32.dll
ModLoad: 77c70000 77cb0000 C:\WINDOWS\system32\GDI32.dll
ModLoad: 77dd0000 77e5d000 C:\WINDOWS\system32\ADVAPI32.dll
ModLoad: 78000000 78086000 C:\WINDOWS\system32\RPCRT4.dll
ModLoad: 10000000 10006000 C:\perl58\lib\auto\threads\threads.dll
ModLoad: 00330000 00335000 C:\perl58\lib\auto\Cwd\Cwd.dll
ModLoad: 00340000 00347000 C:\perl58\lib\auto\List\Util\Util.dll
(f8c.1568): Break instruction exception - code 80000003 (first chance)
eax=7ffdf000 ebx=00000001 ecx=00000002 edx=00000003 esi=00000004 edi=0
+0000005
eip=77f75a58 esp=0037ffcc ebp=0037fff4 iopl=0 nv up ei pl zr n
+a po nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=0
+0000246
ntdll!DbgBreakPoint:
77f75a58 cc int 3
0:003> ~* kv
0 Id: f8c.17dc Suspend: 1 Teb: 7ffde000 Unfrozen
ChildEBP RetAddr Args to Child
0140fc90 77f5c534 77e7a62d 00000be0 00000000 SharedUserData!SystemCall
+Stub+0x4 (FPO: [0,0,0])
0140fc94 77e7a62d 00000be0 00000000 00000000 ntdll!NtWaitForSingleObje
+ct+0xc (FPO: [3,0,0])
0140fcf8 77e7ac21 00000be0 ffffffff 00000000 kernel32!WaitForSingleObj
+ectEx+0xa8 (FPO: [Non-Fpo])
*** WARNING: Unable to verify checksum for C:\perl58\lib\auto\threads\
+threads.dll
0140fd08 10001a0c 00000be0 ffffffff 00000004 kernel32!WaitForSingleObj
+ect+0xf (FPO: [2,0,0])
0140fd38 10002026 00224864 01941f7c 00224864 threads!Perl_ithread_join
++0x71 (CONV: cdecl) [threads.xs @ 572]
*** WARNING: Unable to verify checksum for C:\perl58\bin\perl58.dll
0140fd5c 28040f91 00226ef0 01832e94 00000001 threads!XS_threads_join+0
+x7a (CONV: cdecl) [threads.xs @ 693]
0140fd9c 2805f803 01224864 00224864 28025080 perl58!Perl_pp_entersub+0
+x3ca (CONV: cdecl) [..\pp_hot.c @ 2857]
0140fda8 28025080 00224864 00224864 7ffdf000 perl58!Perl_runops_standa
+rd+0xc (FPO: [1,0,1]) (CONV: cdecl) [..\run.c @ 23]
0140fdb8 28024f37 00224864 00000001 00000000 perl58!S_run_body+0xdd (F
+PO: [2,0,2]) (CONV: cdecl) [..\perl.c @ 1921]
0140fe24 2808a52e 00224864 77f944a8 00000007 perl58!perl_run+0x17e (CO
+NV: cdecl) [..\perl.c @ 1840]
*** WARNING: Unable to verify checksum for C:\perl58\bin\perl.exe
0140ff3c 00401012 00000002 00222cc0 00223160 perl58!RunPerl+0x86 (CONV
+: cdecl) [perllib.c @ 202]
0140ff4c 004010f9 00000002 00222cc0 00223160 perl!main+0x12 (FPO: [3,0
+,0]) (CONV: cdecl) [perlmain.c @ 17]
0140ffc0 77e814c7 77f944a8 00000007 7ffdf000 perl!mainCRTStartup+0xe3
0140fff0 00000000 00401016 00000000 00000000 kernel32!BaseProcessStart
++0x23 (FPO: [Non-Fpo])
1 Id: f8c.f10 Suspend: 1 Teb: 7ffdd000 Unfrozen
ChildEBP RetAddr Args to Child
0331fd38 77f5c534 77f69f68 000007ac 00000000 SharedUserData!SystemCall
+Stub+0x4 (FPO: [0,0,0])
0331fd3c 77f69f68 000007ac 00000000 00000000 ntdll!NtWaitForSingleObje
+ct+0xc (FPO: [3,0,0])
0331fdc4 77f5b2e6 00222d44 280877d9 00222d44 ntdll!RtlpWaitForCritical
+Section+0x125 (FPO: [Non-Fpo])
0331fdcc 280877d9 00222d44 28088542 0197308c ntdll!RtlEnterCriticalSec
+tion+0x46 (FPO: [1,0,0])
0331fdd4 28088542 0197308c 2808852f 01f9efac perl58!VMem::GetLock+0xa
+(FPO: [0,0,0]) (CONV: thiscall) [vmem.h @ 259]
0331fddc 2808852f 01f9efac 28016df0 03bc4bb8 perl58!CPerlHost::FreeSha
+red+0x12 (FPO: [1,0,1]) (CONV: thiscall) [perlhost.h @ 102]
0331fde4 28016df0 03bc4bb8 01f9efac 03bc4bb8 perl58!PerlMemSharedFree+
+0x10 (FPO: [2,0,0]) (CONV: cdecl) [perlhost.h @ 369]
0331fe08 28016b83 01d58014 0197308c 01d58014 perl58!S_cop_free+0x33 (F
+PO: [2,0,3]) (CONV: cdecl) [..\op.c @ 529]
0331fe20 28016b55 01d58014 0197308c 01d58014 perl58!Perl_op_free+0xb8
+(FPO: [2,0,3]) (CONV: cdecl) [..\op.c @ 360]
0331fe38 28016b55 01d58014 0192f5a8 03d09bdc perl58!Perl_op_free+0x8a
+(FPO: [2,0,3]) (CONV: cdecl) [..\op.c @ 345]
0331fe50 28060cb4 01d58014 01973048 03c0690c perl58!Perl_op_free+0x8a
+(FPO: [2,0,3]) (CONV: cdecl) [..\op.c @ 345]
0331fe74 2805fa1e 01d58014 0000009a 2803b014 perl58!Perl_leave_scope+0
+x2c1 (CONV: cdecl) [..\scope.c @ 893]
0331fe80 2803b014 01d58014 00000000 01d58014 perl58!Perl_pop_scope+0x1
+c (FPO: [1,0,0]) (CONV: cdecl) [..\scope.c @ 137]
0331feb4 2805f803 06d58014 01d58014 28025770 perl58!Perl_pp_leaveeval+
+0x1b2 (CONV: cdecl) [..\pp_ctl.c @ 3485]
0331fec0 28025770 01d58014 01d58014 28025618 perl58!Perl_runops_standa
+rd+0xc (FPO: [1,0,1]) (CONV: cdecl) [..\run.c @ 23]
0331fecc 28025618 01d58014 0331ff54 00000000 perl58!S_call_body+0x33 (
+FPO: [3,0,1]) (CONV: cdecl) [..\perl.c @ 2285]
0331ff88 100013e5 01d58014 01a41af4 00000004 perl58!Perl_call_sv+0x37b
+ (CONV: cdecl) [..\perl.c @ 2224]
0331ffb4 77e7d33b 03d09bdc 03b20000 002201b0 threads!Perl_ithread_run+
+0x106 (CONV: stdcall) [threads.xs @ 297]
0331ffec 00000000 100012df 01941f6c 00000000 kernel32!BaseThreadStart+
+0x37 (FPO: [Non-Fpo])
2 Id: f8c.ae0 Suspend: 1 Teb: 7ffdc000 Unfrozen
ChildEBP RetAddr Args to Child
04f1fdbc 77f5c204 77f6a08d 000007ac 03c807d0 SharedUserData!SystemCall
+Stub+0x4 (FPO: [0,0,0])
04f1fdc0 77f6a08d 000007ac 03c807d0 77f5b39d ntdll!NtSetEventBoostPrio
+rity+0xc (FPO: [1,0,0])
04f1fdcc 77f5b39d 00222d44 280877e4 00222d44 ntdll!RtlpUnWaitCriticalS
+ection+0x1f (FPO: [1,0,1])
04f1fdd4 280877e4 00222d44 28088554 01910aa0 ntdll!RtlLeaveCriticalSec
+tion+0x1d (FPO: [1,0,0])
04f1fddc 28088554 01910aa0 2808852f 03e5605c perl58!VMem::FreeLock+0xa
+ (FPO: [0,0,0]) (CONV: thiscall) [vmem.h @ 274]
04f1fde4 2808852f 03e5605c 28016de3 03c804d0 perl58!CPerlHost::FreeSha
+red+0x24 (FPO: [1,0,1]) (CONV: thiscall) [perlhost.h @ 104]
04f1fdec 28016de3 03c804d0 03e5605c 00000000 perl58!PerlMemSharedFree+
+0x10 (FPO: [2,0,0]) (CONV: cdecl) [perlhost.h @ 369]
04f1fe08 28016b83 01ef7014 01910aa0 01ef7014 perl58!S_cop_free+0x26 (F
+PO: [2,0,3]) (CONV: cdecl) [..\op.c @ 517]
04f1fe20 28016b55 01ef7014 01910aa0 01ef7014 perl58!Perl_op_free+0xb8
+(FPO: [2,0,3]) (CONV: cdecl) [..\op.c @ 360]
04f1fe38 28016b55 01ef7014 03cf7754 01bccac4 perl58!Perl_op_free+0x8a
+(FPO: [2,0,3]) (CONV: cdecl) [..\op.c @ 345]
04f1fe50 28060cb4 01ef7014 0194c2cc 01add10c perl58!Perl_op_free+0x8a
+(FPO: [2,0,3]) (CONV: cdecl) [..\op.c @ 345]
04f1fe74 2805fa1e 00000500 00000093 2803b014 perl58!Perl_leave_scope+0
+x2c1 (CONV: cdecl) [..\scope.c @ 893]
04f1fe80 2803b014 01ef7014 00000000 01ef7014 perl58!Perl_pop_scope+0x1
+c (FPO: [1,0,0]) (CONV: cdecl) [..\scope.c @ 137]
04f1feb4 2805f803 06ef7014 01ef7014 28025770 perl58!Perl_pp_leaveeval+
+0x1b2 (CONV: cdecl) [..\pp_ctl.c @ 3485]
04f1fec0 28025770 01ef7014 01ef7014 28025618 perl58!Perl_runops_standa
+rd+0xc (FPO: [1,0,1]) (CONV: cdecl) [..\run.c @ 23]
04f1fecc 28025618 01ef7014 04f1ff54 00000000 perl58!S_call_body+0x33 (
+FPO: [3,0,1]) (CONV: cdecl) [..\perl.c @ 2285]
04f1ff88 100013e5 01ef7014 01aad984 00000004 perl58!Perl_call_sv+0x37b
+ (CONV: cdecl) [..\perl.c @ 2224]
04f1ffb4 77e7d33b 03e050b4 00000002 00220228 threads!Perl_ithread_run+
+0x106 (CONV: stdcall) [threads.xs @ 297]
04f1ffec 00000000 100012df 019fcdf4 00000000 kernel32!BaseThreadStart+
+0x37 (FPO: [Non-Fpo])
# 3 Id: f8c.1568 Suspend: 1 Teb: 7ffdb000 Unfrozen
ChildEBP RetAddr Args to Child
0037ffc8 77f5f31f 00000005 00000004 00000001 ntdll!DbgBreakPoint (FPO:
+ [0,0,0])
0037fff4 00000000 00000000 00000000 00000000 ntdll!DbgUiRemoteBreakin+
+0x2d (FPO: [Non-Fpo])
| [reply] [d/l] |
for /L %i in (1,1,1000) do echo %i & start 359907 & 359907
for little over 5 hours and in that time, it had run to completion 108 times with nary a peep in the event log.
It's not clear to me what you debugger trace output is showing?
It's a (long) while since I used MS kernel debugger, but I seem to remember that it automatically suspended threads except the main thread until you manually enabled them?
It couldn't be that it's the debugger itself that is leading to the impression that something is hanging?
Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
| [reply] [d/l] |
After reproducing the hangs with a smallish C program, I have finally found the root cause of the hangs. I updated my D865GBF Intel hyper-threaded P4 motherboard BIOS from version BF86510A.86A.0046.P10 to the latest version BF86510A.86A.0061.P17 and the hangs disappeared. This is the only time I can remember that a bug in my program turned out to be a firmware bug! :-) | [reply] |
First, thanks to the monks who have tested this program for me! I'm still hoping to find someone who can reproduce my hangs; then we can play Sherlock Holmes to try to figure what is different about the hanging machines versus the non-hanging ones. Oh, and I get the hangs with debug builds too, which is good news.
I'm learning far more about critical sections than I ever wanted to. When I issue the !cs command in WinDbg to display all critical sections, I get this on the offending critical section:
DebugInfo = 0x00043bf0
Critical section = 0x00222d44 (+0x222D44)
LOCKED
LockCount = 0x0
OwningThread = 0x00000000
RecursionCount = 0x0
LockSemaphore = 0x7B0
SpinCount = 0x00000000
As noted
here
having an OwningThread of zero is "illogical" with a LockCount of zero:
While examining a deadlocked program, we also found a state for which there seemed to be no logical explanation. The LockCount field for one heavily used critical section contained a value greater than -1; that is, it was owned, but the OwningThread field was zero (which destroyed evidence against the guilty thread). The test program was multithreaded, and the condition arose on both single and multiprocessor machines. Although LockCount and other values differed from run to run, the program always deadlocked on the same critical section. We'd definitely be curious to find out if any other developers out there have seen the API call sequence leading up to this state.
So, maybe I should contact Matt Pietrek to investigate this bug for me. :-)
| [reply] [d/l] |