Knowing when a code string contains code that will be executed at compile-time/end?

perlancar has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Knowing when a code string contains code that will be executed at compile-time/end? by davido (Cardinal) on Aug 15, 2020 at 16:21 UTC
Despite my misgivings over the thing you're trying to do -- its security implications and the impression I get that you're dealing with an XY Problem -- I played with this a little last night. I had some theories: If I could tie using Tie::StdScalar the `${^GLOBAL_PHASE}` variable to a class that warns when entering a phase other than RUN I might be able to detect what is in the pad at those other phases, to determine if other code has been executed. It was a long shot, and ultimately didn't work, because tying GLOBAL_PHASE is disallowed, because it is a readonly. But that only would have been a kludge anyway. I might have been able to throw an exception if I entered CHECK or START, but those phases should occur any time code is evaled, so it would have led down the next problem of determining if there was custom code in those stages. That's probably nearly impossible. I hadn't really thought that part through, and it would probably never work. But as I mentioned, tie was a non-starter due to the read-only nature of `${^GLOBAL_PHASE}`. Another possibility might have been using attributes, which are phase-aware. But then how to trigger them? It didn't seem like there was a solution there either. Plus your goal is to disallow code that has custom code in BEGIN, END, DESTROY, CHECK, INIT, and UNITCHECK phases, I think. But UNITCHECK and END don't take place within the eval. In fact, if there's an END in the eval, it gets queued up to run at end of runtime, not at end of the eval scope (see perlmod). Additionally, you really want to detect these phases before any code runs in them. I just don't see it happening. The alternative is to parse Perl, but then you deal with the Halting Problem. Most naive approaches such as a pattern match can easily be foiled by someone embedding a simple `tr///` within the eval, to hide a second level of eval behind a ROT13 cipher. It is nearly impossible to programatically grok what is going to happen inside a string eval without doing a string eval. I just don't think you're going to find a solution that protects you from code running in phases you don't want it to. But if you're string evaling code a user gave you, global phases are the least of your risks. Dave	[reply] [d/l] [select]
Re: Knowing when a code string contains code that will be executed at compile-time/end? (updated) by haukex (Archbishop) on Aug 15, 2020 at 08:03 UTC
I am given a code string by user, and wants to know if the code contains statements or special blocks that are executed not in regular run time. Impossible with a static parse (Update: the classic links on "only `perl` can parse Perl": On Parsing Perl and Perl Cannot Be Parsed: A Formal Proof, and tye's reply to the latter). Consider for example `eval(uc('end').'{...}')` or something more convoluted, like variations on `s s s END { ... } see` (Update 3: like `s x x qq s \Uens.'D{...}' xexe`). The only way to do this safely is to limit the user to a subset of Perl that is statically parseable. See the new module standard and Sawyer's recent talk on it. Update 2: As LanX points out, B::Deparse is not perfect, and using standard unfortunately doesn't protect you from those issues either. It would be possible to keep both the original standard-conforming string (checked to ensure it doesn't contain any `BEGIN`, `eval`, `do`, `use`, and so on) and its evaled coderef, e.g. in an object that could overload stringification and coderef-dereferencing, but that might be overkill depending on what you're trying to do.	[reply] [d/l] [select]
Re^2: Knowing when a code string contains code that will be executed at compile-time/end? by LanX (Saint) on Aug 15, 2020 at 09:23 UTC
Your examples use run time actions to create END blocks. They should show up when applying B::Deparse Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^3: Knowing when a code string contains code that will be executed at compile-time/end? by haukex (Archbishop) on Aug 15, 2020 at 09:46 UTC
They should show up when applying B::Deparse Nope, as the OP shows. Plus deparsing requires an eval of the string first. `use warnings; use strict; use B::Deparse; my $code = q{ print "code\n"; s s s END { print "end\n" } see }; my $coderef = eval qq{ sub { $code } }; my $deparse = B::Deparse->new(); $deparse->ambient_pragmas(strict => 'all', warnings=>'all'); print $deparse->coderef2text($coderef), "\n"; __END__ { print "code\n"; s/ /();/ee; } end` [download]	[reply] [d/l]
Re: Knowing when a code string contains code that will be executed at compile-time/end? (Update) by LanX (Saint) on Aug 15, 2020 at 09:16 UTC
I don't really understand what your goal is. If you eval a code string just keep the string. It's nearly impossible to statically parse Perl for compile time code, it can even be hidden inside a regex.� The only approach I could think of is to eval it and to make the compiler die if he encounters a `use` or `BEGIN` or similar. ◇ And I'm not even sure if that is feasable. On another note: better don't rely on B::Deparse it's only a 95% solution. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery} �) See Re: Vulnerabilities when editing untrusted code... (Komodo) ◇) probably with `Safe`	[reply]
Re^2: Knowing when a code string contains code that will be executed at compile-time/end? (Update) by perlancar (Hermit) on Aug 16, 2020 at 11:58 UTC
You might hit the nail on the head here. I am working on code string templates, and want to separate the process that fills in the templates with the process that runs the filled-in code string. But I could've just passed the filled-in code string instead of the eval'ed+dumped+deparsed code string.	[reply]
Re: Knowing when a code string contains code that will be executed at compile-time/end? by kcott (Archbishop) on Aug 15, 2020 at 11:44 UTC
G'day perlancar, "I'm thinking of PPI right now, but that seems too heavy." [Disclaimer: I've never used either PPI or PPR.] I was wondering if PPR might be a less-heavy option. A quick reading of "PPR: Comparison with PPI" suggests it might be. I first heard of PPR in "Keynote by Damian Conway - "Three Little Words" - YouTube". If nothing else, that should provide about a hour of very good entertainment; I thoroughly enjoyed watching it. — Ken	[reply]
Re^2: Knowing when a code string contains code that will be executed at compile-time/end? by haukex (Archbishop) on Aug 15, 2020 at 16:27 UTC
Although PPR can be used to match Perl code or parts of Perl code (like say if you're trying to implement your own keyword and need a way to express the grammar for it), it is not a real parser, that is, it doesn't really build a syntax tree that can then be inspected. In that respect, PPI is better.	[reply]
Re: Knowing when a code string contains code that will be executed at compile-time/end? by LanX (Saint) on Aug 15, 2020 at 09:38 UTC
I never used it and can't tell if it covers all edge cases. But you may want to play around with `Safe` to disallow compile time code. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^2: Knowing when a code string contains code that will be executed at compile-time/end? by afoken (Chancellor) on Aug 15, 2020 at 12:19 UTC
you may want to play around with Safe to disallow compile time code Of course, that requires a lot of trust in Safe and its configuration having no errors. See Re^3: unable to eval dumped hash. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]
Re: Knowing when a code string contains code that will be executed at compile-time/end? by perlfan (Vicar) on Aug 15, 2020 at 07:47 UTC
I think the main question is, can these blocks/statements be obfuscated to the point where they are not detected via simple static analysis using a regex? What I mean by this is, can't you just check for `use`, `BEGIN`, etc with a regular expression? I don't think you need to answer the question of, is this valid Perl, based on your description, do you?	[reply] [d/l] [select]
Re^2: Knowing when a code string contains code that will be executed at compile-time/end? by haukex (Archbishop) on Aug 15, 2020 at 08:30 UTC
First, it is uncool to update a node in a way that renders replies confusing or meaningless. Please mark your significant updates as such. Since this is a potentially security related question, I decided to make this comment: Your posting history contains quite a few helpful posts. However, recently, you also appear to have been making a lot of posts that are clearly guesses, or in some cases, are frankly wrong, like I think this node was before you edited it. When I am guessing at a reply I try to say so clearly, or when I don't know the answer at all, I don't reply (e.g. you don't see me replying to Tk questions). My suggestion is that it would be better if you focus on quality instead of quantity.	[reply]


go ahead... be a heretic
	PerlMonks