Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Looking for advice on how to tune stack size for threads

by fx (Pilgrim)
on Dec 02, 2010 at 16:48 UTC ( [id://874944] : perlquestion . print w/replies, xml ) Need Help??

fx has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

The Perl threads documentation (found at http://search.cpan.org/~jdhedden/threads-1.81/lib/threads.pm) states:

THREAD STACK SIZE...By tuning the stack size to more accurately reflect your application's needs, you may significantly reduce your application's memory usage, and increase the number of simultaneously running threads.

Now, search as I may, I cannot find any advice or pointers on exactly how to tune the stack size. Can anyone offer any advice on where I need to look?....

fx, Infinity is Colourless

Replies are listed 'Best First'.
Re: Looking for advice on how to tune stack size for threads
by BrowserUk (Patriarch) on Dec 02, 2010 at 20:00 UTC

    I'd set it to: stack_size => 4096 and then try it.

    Actually. I've modified my perl binary so that the 'default' value used, when no stack size is explicitly stated. is 4096, this being the minimum that can be used on my system. So far, I've never encountered a problem due to stack.

    The reason you can get away with this, is because for the most part, Perl doesn't use the C/processor stack.

    Perl manages it own stack (actually:stacks), and these are allocated from the heap. Until recently, very complex regex could consume prodigious amount of the C stack, but then p5p (I think: dave_the_m?) converted the then recursive regex engine to iterative, thus removing perl's last big stack dependency.

    However, if you use modules with badly written XS or C components that either allocate large items/quantities of items on their stacks; or enter unbridled recursion, they may cause your threads to run out of (thread) stack if you set it too low.

    Personally, I've never encountered such a problem, even with my extreme minimalism, but I'm also quite choosy about the modules I use.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Yes, trial and error seems to be the way here!....Never mind...

      My original code had lots of different modules so I was thinking that the stack size needed to be quite high. Then, as so often is the way with Perl, I found that as I increased the number of threads the more randomly the program crashed. I've ran into so many thread-unsafe (or perhaps thread-unsure!) modules that a lot of my code is now system calls out to Unix command line utilities that do the same thing.

      Yes, performance takes a hit but, in this case at least, I'm more interested in running many slow running processes concurrently so it's not all bad....

      Thanks for your reply!

      fx, Infinity is Colourless

        Yes, trial and error seems to be the way here!....Never mind...

        Actually, you missed my point completely. Perhaps I didn't make it clear enough.

        The value of the stack size parameter will almost never be the source of, or the cure for, program crashes with threaded perl code.

        The default value (on windows, it may vary on other platforms) is 16MB. This is a (ludicrously) oversized allocation as attested to by the fact that--as I suggested above--when I've set this to be 4000 times smaller at just 4096 bytes, I've yet (on 5.10+) to find a piece of Perl code that would cause it to require more stack.

        I did succeed in making a thread crash through stack exhaustion on 5.8.something by using a carefully constructed regex; but that doesn't work any more.

        For some of the background to this please see Use more threads.. It was my explorations detailed in that thread that caused the stack_size parameter to be added to threads. The motivation being to allow the programmer to reduce the size of the default in order to save memory. I've never seen nor heard of an occasion when it needed to be increased.

        My original code had lots of different modules so I was thinking that the stack size needed to be quite high.

        That's a quite common misperception.

        Even the most memory hungry pure Perl module--Date::Manip is an old favourite for this purpose--requires almost no processor stack.

        And the same is true for the vast majority of well-written modules with XS/C components.

        Then, as so often is the way with Perl, I found that as I increased the number of threads the more randomly the program crashed.

        No disrespect meant, but this is almost always down to programming errors as a result of programmer misunderstanding.

        Just as when many monks get a little tetchy when people post statements that their program: "crashes", "fails", "prints errors", "doesn't work"; stories of such failures without detailed errors and code to demonstrate them are frustrating, because they leave no avenue through which to address them.

        I've ran into so many thread-unsafe (or perhaps thread-unsure!) modules that a lot of my code is now system calls out to Unix command line utilities that do the same thing.

        Yes, performance takes a hit but, in this case at least, I'm more interested in running many slow running processes concurrently so it's not all bad.

        Hm. Well, if it works for you that great, but it would be a whole lot better for everyone if you would post code that demonstrates the problems. That way, we can either work out what you are doing wrong; or detail the problems sufficiently that a bug report can be raised against it so they get fixed.

        My conservative estimate is that >80% of "threads problems" raised here are actually quickly identified as programmer errors that usually have relatively simple fixes; though it does often mean correcting some misconceptions and encouraging somewhat different working practices. Especially for those programmers used to using threading of different flavours in other languages.

        The upside of that is that the working code is usually far simpler than the buggy code originally posted.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Looking for advice on how to tune stack size for threads
by Illuminatus (Curate) on Dec 02, 2010 at 18:06 UTC
    Thread stack tuning is more an art than a science. At least in C you know how big things on the stack are. If your threads are fairly simple (ie, don't create arbitrarily-deep call-stacks), you can usually make a rough guess, add 20%, and be done with it. Otherwise, I usually have to start with the default, and drop it in 256k increments until it dies. Then go back to the last success, then add 20-40% (depending on complexity of boundary condition handling)

    fnord

      "Thread stack tuning is more an art".

      I hate art and was suspecting that the answer might actually be something like this. What I really wanted (hoped, wished for deep down...) was a nice hard'n'fast formula. I didn't really think they'd be one and it seems like that is correct.

      Thanks for your reply. It backed up my own suspicions. Trial and error it is!!!!

      fx, Infinity is Colourless

Re: Looking for advice on how to tune stack size for threads
by sundialsvc4 (Abbot) on Dec 02, 2010 at 18:27 UTC

    I strongly advise keeping the total number of threads small.   If you need a work-queue, or even a full request-management infrastructure, such things already exist within CPAN.   “A thread” and “a request” are not the same thing.   A single request-processing thread might live for many months and process millions of requests during its venerable lifetime.

Re: Looking for advice on how to tune stack size for threads
by choroba (Cardinal) on Dec 02, 2010 at 17:18 UTC
    Look 5 lines below. There you can see how to set the value. Then you have to try different values to "tune" it, i.e. to find the value that suits you best.

      Thanks for your reply. Yes, I am aware of how to set and retrieve the stack size. My question was intended to be slightly different. Knowing how to set and get the size if all well and good, but I'm after advice as to _what_ I should be setting it to in the first place, and how I should try to estimate the correct value for this...

      fx, Infinity is Colourless

Re: Looking for advice on how to tune stack size for threads
by Khen1950fx (Canon) on Dec 02, 2010 at 21:47 UTC
    I tried a different approach to controlling stack size. Using Thread::Stack, you can control the number of threads by pushing a list or popping a scalar on the top of the stack:
    #!/usr/bin/perl use strict; use warnings; use Thread::Stack; my $s = new Thread::Stack; $s->push(qw(thr1 thr2 thr3 thr4 thr5)); print my $size = $s->size, "\n"; $s->pop("thr1"); print $size = $s->size, "\n";

      Thread::Stack has nothing whatsoever to do with the OPs question.

      And the way you are using it in your "example" is completely useless.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      Thanks for your reply. I already have a method with a similar thinking as to how to control the number of threads. However my question was intended to be slightly different. I was asking for advice as to where to look for info on setting the actual value for the stack size.

      fx, Infinity is Colourless

Re: Looking for advice on how to tune stack size for threads
by sundialsvc4 (Abbot) on Dec 03, 2010 at 17:03 UTC

    Since the thread stack is virtual storage anyway, a 16MB limit is not uheard-of or particularly costly.   The only problem might arise if you have so many threads that you run out of the a-few-gigabytes virtual memory address-space.   And if that be the case, the stack-size is hardly to blame...

    There is far more grief from a stack that is too small, or from a thread-count that is too large.   If you conclude that the stacks must be shrunk “to make room,” your real problem is most likely that there are way too many of them.   JM2CW.

      Sorry, but this is another case of a little knowledge being a dangerous thing.

      The main problem with starting threads with a default stack reserve of 16MB is not (just) the wastage of 15.996MB of virtual address space, (per thread), that will never be used for stack; and cannot then be used for anything else. As annoying and completely unnecessary as that is.

      Far more insidious and consequential is the affect it has in the fragmentation of the virtual address space it leaves available to the rest of the program in terms of heap space.

      You see, when a chunk of virtual address space is reserved, whilst it doesn't actually get allocated to the process or within the backing store (swap space), it does remove it from the pool of virtual address space that can subsequently be allocated to anything else. Eg. heap.

      Now, whilst (say) 4 threads each reserving it's 16MB of VM doesn't sound like much of a problem, being only 3% of a typical 32-bit processes VM space. The problem arises when you look at where, within that 2GB VM address map, those 4x16MB chunks get allocated.

      Because thread stacks get allocated at runtime, after many chunks of heap, (used by the code and data for modules loaded at start-up), have already been allocated, those 16MB chunks invariably have to be allocated somewhere in the middle of the virtual address map. And the effect of that is to fragment the total pool of allocatable virtual address space, in a way that means it can severely restrict the size of any subsequent single allocations.

      In English, that means it severely limits the size of the biggest array or hash you can create. Because even though you have plenty of unused VM to accommodate the elements of that array, perl needs to be able to allocate a single, contiguous chunk of memory for the AV component of the structure. And because the 3 or 4 chunks of unused & un-reusable stack space are spread throughout the memory map, there is no single chunk big enough to hold it.

      However, if those thread stacks only reserve as much VM as they are actually likely to use, then they will rarely ever get allocated in the middle of the VM address map, because it is far easier to find an unallocated space in low memory to accommodate a 1 or 2 page allocation than it is to accommodate a 4096 page allocation.

      To demonstrate the difference this makes, below are the (simplified) VM memory maps of the same perl script that creates 4 threads.

      The first uses the default thread stack allocation of 16MB. The memory maps are shown side by side before and after the threads are spawned:

      0x00010000 - 0x00010000 - 0x00110000 threads.dll 0x00110000 threads,dll 0x00400000 perl.exe 0x00400000 perl.exe 0x0140b000 thread 0x1674 stack area 0x0140b000 thread 0x1674 stack +area 0x015d0000 default process heap 0x015d0000 default process heap 0x04f8e000 thread 0x05d4 stack +area 0x0636e000 thread 0x0dc8 stack +area 0x0774e000 thread 0x1bcc stack +area 0x08b2e000 thread 0x1198 stack +area 234MB contiguous free space 116MB contiguous free space 0x10000000 guard32.exe 0x10000000 guard32.exe ... ...

      Notice how the allocation of the (required) 16kb of stack space has effectively halved the contiguous free space. Meaning that the largest array or hash that can be allocated has also been halved.

      Now the same program except it uses a 4k thread stack allocation:

      0x00010000 0x00010000 0x00120000 threads.dll 0x00120000 threads.dll 0x0017e000 thread 0x1dc2 stack +area 0x0034e000 thread 0x1f70 stack +area 0x0039e000 thread 0x0f90 stack + area 0x003ee000 thread 0x15a4 stack +area 0x00400000 perl.exe 0x00400000 perl.exe 0x0140b000 thread 0x1a4c stack area 0x0140b000 thread 0x1a4c stack +area 0x01530000 default process heap 0x01530000 default process heap 234MB contiguous free space 234MB contiguous free space 0x10000000 guard32.exe 0x10000000 guard32.exe ... ...

      Notice how, because the stack reservations requested are so much smaller, the VM allocator has managed to tuck the 4 stack areas away into otherwise unused areas of low memory, leaving the contiguous free space completely untouched. Meaning that the maximum size of large data structures that the program can deal with, before having to resort to costly disk-based solutions, remains unaffected.

      You see, notional knowledge, read somewhere and regurgitated at random intervals, is no substitute for actually understanding the details of what goes on under the covers. Just as notional wisdom based on aphorisms like: 'optimisation is the root of all evil', are no substitute for understanding that the omission of the word 'premature'; or the equally common misunderstanding that 'premature' does not mean 'any'; are the wrong kind of laziness.

      There's an old saying applicable here as in many fields: "take care of the pennies, and the pounds take care of themselves". Throwing big numbers at memory allocations "because it's only virtual memory", and/or "because memory is cheap" simply doesn't cut it when the costs of the transition from memory-based storage to disk-based storage continues to be so high. Even in these days of relatively cheap SSDs, the multipliers involved have only dropped from 3 to 2 orders of magnitude. And there are no signs that is going to improve any time soon.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://874944]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2024-02-24 16:29 GMT
Voting Booth?
My favourite way to spend a leap day ...











Results (22 votes). Check out past polls.