Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^3: What perl operations will consume C stack space?

by hv (Prior)
on Feb 27, 2006 at 11:01 UTC ( [id://532986]=note: print w/replies, xml ) Need Help??


in reply to Re^2: What perl operations will consume C stack space?
in thread What perl operations will consume C stack space?

Sorry, I can't answer that without fuller definitions of "legitimate" and "cannot be better expressed" - which does my previous example fall foul of?

Any pattern that repeats a "non-simple" expression will consume C-stack space on each repetition. At least 3 of the 4 bugs linked to the metabug #24274 arose from people solving real world tasks, and there are more in the bugs database that should also be linked to the metabug - more recent ones often involve searching for particular structures in a genome.

Here's another common fragment that invokes the problem:

$_ = sprintf q{"%s"}, "a" x 32768; /"((?:\\.|[^"])+)"/;
though it consumes stack at only half the rate of the /(ab*)*/ variety.

Hugo

Replies are listed 'Best First'.
Re^4: What perl operations will consume C stack space?
by BrowserUk (Patriarch) on Feb 27, 2006 at 14:08 UTC

    I'm probably missing something, but I don't see the circumstances under which /(ab*){32766}/ couldn't be replaced by /(ab+|a){32766}/ with the same outcome (except the core dump), but much less backtracking?

    Likewise, is there anything that /"((?:\\.|[^"])+)"/ would match that /"([^"]+)"/ wouldn't?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Your first factorisation(sp?) is valid - (ab*){32766} and (ab+|a){32766} are equivalent, as they will both match and not match exactly the same strings (modulo any deficiencies in the Perl interpreter). Your second factorization is not equivalent, as the first regex allows for backslashed items, which the second doesn't:

      #!/usr/bin/perl -w use strict; my @regexen = ( qr/"((?:\\.|[^"])+)"/, qr/"([^"]+)"/, ); for (<DATA>) { for my $r (@regexen) { print "$_\t"; print "\t$r"; if (/$r/) { print "\tMatch\t($1)\n"; } else { print "\tNo match.\n"; }; }; }; __DATA__ "foo\"bar"

      Corion answers your second point; on the first point, refactoring to /(ab+|a)+/ reduces stack usage but does not eliminate it: for me, "a" x $n cores with /(ab*)+/ at n=10080 and with /(ab+|a)+/ at n=20157, so it appears to save exactly half of the stack usage.

      As TimToady mentioned, anything that quantifies "a compound submatch of varying length" will trigger it. (In fact even "compound" does not seem required, as /(a+?)+/ attests.)

      Hugo

        On my system using 5.8.6, /(ab*){$n}/ cores with $n == 21166, whereas /(ab|a){$n}/ completes sucessfully for all values on $n upto the repetition limit of 32766. If I drop the stack reservation to 8 MB (similar to the default on Linux?), then I get a similar breakpoint of 10582.

        That seems to indicate that (OMS), the regex engine requires 792 bytes of stack for each repetition. That seems a lot of state to preserve on the stack, but I know nothing about how the regex engine is implemented, so it's probably not.

        It does make me wonder whether repetition counts, at least in these fairly simple cases, couldn't be fulfilled with by a tail recursive routine to alleviate the stack growth?

        If not, isn't there some scope for putting a check of the form die 'Not enough stack' if reps > stacksize / 792?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://532986]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (2)
As of 2024-04-19 18:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found