Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

XS Error: Segfault with B::HooksAtRuntime

by Ovid (Cardinal)
on Aug 05, 2022 at 10:01 UTC ( [id://11145964] : perlquestion . print w/replies, xml ) Need Help??

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

My module, MooseX::Extended is quickly becoming popular and is now being used in production at some companies. However, one person is reporting intermittent segfaults. This appears to be related to my using B::Hooks::AtRuntime to avoid the need to add __PACKAGE__->meta->make_immutable; to the end of every Moose module. There's not much XS code involved, but my XS knowledge is even worse than my C.

Paul "LeoNerd" Evans commented on IRC:

14:18 LeoNerd: #0 Perl_SvREFCNT_dec_NN (sv=0xa65636e6174736e, my_perl=0x55921df002a0) at inline.h:242 <== that looks very much like a bad sv address
14:20 LeoNerd: Not terribly clear where that comes from.. the next context frame is popeval, which suggests stack unwind. Possibly at this point some accessing of bad memory
14:20 LeoNerd: valgrind might help.

Can anyone with XS knowledge help me? As far as I can tell, the code is still solid for prod. I'm wondering if this has something to do with the effectively random order of global destruction because this is just being triggered by a compilation test. (That's just speculation and could be a red herring).

Note: If anyone else experiences this, the workaround is to simply exclude the automatic immutable behavior and add it manually to your M ooseX::Extended classes.

Replies are listed 'Best First'.
Re: XS Error: Segfault with B::HooksAtRuntime
by dave_the_m (Monsignor) on Aug 06, 2022 at 15:08 UTC
    Well the crash is happening at the point where the perl interpreter has just finished executing the body of a require'd file (the require occurring as part of a 'use'). The interpreter is popping the CXt_EVAL context frame off the context stack (which was pushed on when the require'd file was about to be compiled). One step of popping the context is to decrement the recount of a temporary SV pointed to from within the context struct (which happens to hold the name of the file being compiled - or possibly the package name; can't remember which). The pointer stored in the context struct is obviously corrupt - it looks like part of the context struct has been overwritten with the text "nstance", as has been pointed out.

    What the cause of this is, I don't know. Given that you're doing weird stuff with delaying execution, my initial instinct was that the context stack pointer has been decremented and then some other code has pushed a new context frame on, overwriting the eval context - all before the interpreter has actually returned from doing the require, However in this case, I would expect a new context frame (even of a different type than CXt_EVAL) to be mainly full of pointers and the like - not full of literal text. So it looks like a deeper problem with a rouge string pointer somewhere.


      I can't solve the problem, but that info turned out to be sufficient for me to guess where to poke. I can reproduce a segfault in just 7 lines of code.

      use lib '.'; use Demo;
      package Demo; use MooseX::Extended; require Stuff;
      package Stuff; use MooseX::Extended;
      With this code, perl -wc segfaults, and valgrind throws a 455-line tantrum.
        In general terms, that valgrind output shows that the calling of a sub while running code via call_after() (which requires a new CXt_SUB context sub to be pushed) grows the context stack (by reallocating it). Sometime later when exiting a use/require, the Cxt_EVAL frame is accessed using the old context stack address (the smaller stack that was freed when a larger one was allocated). So maybe something is holding on to a context stack pointer when it shouldn't - either in core or XS.


      I've now fully diagnosed the issue and have a small reproducible test case. The issue is the context stack being reallocated when deeply nests sub calls are made from after_runtime(), which is called from a destructor while exiting the eval scope from the require. That exiting code isn't expecting the context stack to be reallocated from under it. Normally when perl itself calls out to code from things like destructors, tied method calls etc, it uses a new temporary set of stacks (argument, context etc). after_runtime() needs to do something similar. This is achieved via the PUSHSTACKi() macro. Look at a distribution like Async-Interrupt for an example.

      Here's the reproducible code. The recursive sub is called enough times to trigger a context stack grow/realloc.

      ------------------------- use lib '.'; use Foo; ------------------------- package Foo; use Bar; 1; ------------------------- package Bar; use B::Hooks::AtRuntime 'after_runtime'; sub recurse { my $depth = shift; return if $depth < 0; recurse($depth -1); } sub import { after_runtime { recurse(20); } }
      If this is run against a perl that has been built with -DDEBUGGING, you'll see the following assertion failure, which is Perl_cx_popeval() detecting that the current cx pointer has changed underneath it.
      perl: inline.h:2921: Perl_cx_popeval: Assertion `CxTYPE(cx) == CXt_EVA +L' failed.


Re: XS Error: Segfault with B::HooksAtRuntime
by kikuchiyo (Hermit) on Aug 05, 2022 at 12:01 UTC
    Not necessarily relevant, but that bad address consists of all lowercase letters ("ecnatsn") apart from the leading 0xa. Looks like gibberish, but it supports the theory of accessing bad memory.

      Looks like gibberish

      You forgot take LE byte order into account. It's actually "nstance␊" (the final letters of "instance" or "Instance" followed by a LF).

      $ perl -e' use DDP print_escapes => 1; p ${\ pack "J", 0xa65636e6174736e }; ' "nstance\n"