Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Knowing when a code string contains code that will be executed at compile-time/end?

by perlancar (Hermit)
on Aug 15, 2020 at 05:11 UTC ( [id://11120760]=perlquestion: print w/replies, xml ) Need Help??

perlancar has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I am given a code string by user, and wants to know if the code contains statements or special blocks that are executed not in regular run time. For example:

use Foo "bar"; bar(); # yes bar(); baz() if $qux; # no bar(); END { warn "qux" } # yes

The reason is because I need to eval the code string and later dump the code back to string. And these statements/blocks do not get included. Demonstration:

% perl -MData::Dumper -E'$Data::Dumper::Deparse=1; print Dumper(eval q +[sub {use File::chdir; local $CWD="/"; END { warn } }])' $VAR1 = sub { use feature 'current_sub', 'bitwise', 'evalbytes', 'fc', ' +postderef_qq', 'say', 'state', 'switch', 'unicode_strings', 'unicode_ +eval'; local $CWD = '/'; }; Warning: something's wrong at (eval 5) line 1.

I want to be able to warn the user when her code contains these things. I'm thinking of PPI right now, but that seems too heavy. Ideas on other ways to do this?

Replies are listed 'Best First'.
Re: Knowing when a code string contains code that will be executed at compile-time/end?
by davido (Cardinal) on Aug 15, 2020 at 16:21 UTC

    Despite my misgivings over the thing you're trying to do -- its security implications and the impression I get that you're dealing with an XY Problem -- I played with this a little last night. I had some theories:

    If I could tie using Tie::StdScalar the ${^GLOBAL_PHASE} variable to a class that warns when entering a phase other than RUN I might be able to detect what is in the pad at those other phases, to determine if other code has been executed. It was a long shot, and ultimately didn't work, because tying GLOBAL_PHASE is disallowed, because it is a readonly.

    But that only would have been a kludge anyway. I might have been able to throw an exception if I entered CHECK or START, but those phases should occur any time code is evaled, so it would have led down the next problem of determining if there was custom code in those stages. That's probably nearly impossible. I hadn't really thought that part through, and it would probably never work. But as I mentioned, tie was a non-starter due to the read-only nature of ${^GLOBAL_PHASE}.

    Another possibility might have been using attributes, which are phase-aware. But then how to trigger them? It didn't seem like there was a solution there either.

    Plus your goal is to disallow code that has custom code in BEGIN, END, DESTROY, CHECK, INIT, and UNITCHECK phases, I think. But UNITCHECK and END don't take place within the eval. In fact, if there's an END in the eval, it gets queued up to run at end of runtime, not at end of the eval scope (see perlmod). Additionally, you really want to detect these phases before any code runs in them. I just don't see it happening.

    The alternative is to parse Perl, but then you deal with the Halting Problem. Most naive approaches such as a pattern match can easily be foiled by someone embedding a simple tr/// within the eval, to hide a second level of eval behind a ROT13 cipher. It is nearly impossible to programatically grok what is going to happen inside a string eval without doing a string eval.

    I just don't think you're going to find a solution that protects you from code running in phases you don't want it to. But if you're string evaling code a user gave you, global phases are the least of your risks.


    Dave

Re: Knowing when a code string contains code that will be executed at compile-time/end? (updated)
by haukex (Archbishop) on Aug 15, 2020 at 08:03 UTC
    I am given a code string by user, and wants to know if the code contains statements or special blocks that are executed not in regular run time.

    Impossible with a static parse (Update: the classic links on "only perl can parse Perl": On Parsing Perl and Perl Cannot Be Parsed: A Formal Proof, and tye's reply to the latter). Consider for example eval(uc('end').'{...}') or something more convoluted, like variations on s s s END { ... } see (Update 3: like s x x qq s \Uens.'D{...}' xexe).

    The only way to do this safely is to limit the user to a subset of Perl that is statically parseable. See the new module standard and Sawyer's recent talk on it.

    Update 2: As LanX points out, B::Deparse is not perfect, and using standard unfortunately doesn't protect you from those issues either. It would be possible to keep both the original standard-conforming string (checked to ensure it doesn't contain any BEGIN, eval, do, use, and so on) and its evaled coderef, e.g. in an object that could overload stringification and coderef-dereferencing, but that might be overkill depending on what you're trying to do.

        They should show up when applying B::Deparse

        Nope, as the OP shows. Plus deparsing requires an eval of the string first.

        use warnings; use strict; use B::Deparse; my $code = q{ print "code\n"; s s s END { print "end\n" } see }; my $coderef = eval qq{ sub { $code } }; my $deparse = B::Deparse->new(); $deparse->ambient_pragmas(strict => 'all', warnings=>'all'); print $deparse->coderef2text($coderef), "\n"; __END__ { print "code\n"; s/ /();/ee; } end
Re: Knowing when a code string contains code that will be executed at compile-time/end? (Update)
by LanX (Saint) on Aug 15, 2020 at 09:16 UTC
    I don't really understand what your goal is.

    If you eval a code string just keep the string.

    It's nearly impossible to statically parse Perl for compile time code, it can even be hidden inside a regex.°

    The only approach I could think of is to eval it and to make the compiler die if he encounters a use or BEGIN or similar. ◇

    And I'm not even sure if that is feasable.

    On another note: better don't rely on B::Deparse it's only a 95% solution.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    °) See Re: Vulnerabilities when editing untrusted code... (Komodo)

    ◇) probably with Safe

      You might hit the nail on the head here. I am working on code string templates, and want to separate the process that fills in the templates with the process that runs the filled-in code string. But I could've just passed the filled-in code string instead of the eval'ed+dumped+deparsed code string.
Re: Knowing when a code string contains code that will be executed at compile-time/end?
by kcott (Archbishop) on Aug 15, 2020 at 11:44 UTC

      Although PPR can be used to match Perl code or parts of Perl code (like say if you're trying to implement your own keyword and need a way to express the grammar for it), it is not a real parser, that is, it doesn't really build a syntax tree that can then be inspected. In that respect, PPI is better.

Re: Knowing when a code string contains code that will be executed at compile-time/end?
by LanX (Saint) on Aug 15, 2020 at 09:38 UTC
    I never used it and can't tell if it covers all edge cases.

    But you may want to play around with Safe to disallow compile time code.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      you may want to play around with Safe to disallow compile time code

      Of course, that requires a lot of trust in Safe and its configuration having no errors. See Re^3: unable to eval dumped hash.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Knowing when a code string contains code that will be executed at compile-time/end?
by perlfan (Vicar) on Aug 15, 2020 at 07:47 UTC
    I think the main question is, can these blocks/statements be obfuscated to the point where they are not detected via simple static analysis using a regex?

    What I mean by this is, can't you just check for use, BEGIN, etc with a regular expression? I don't think you need to answer the question of, is this valid Perl, based on your description, do you?

      First, it is uncool to update a node in a way that renders replies confusing or meaningless. Please mark your significant updates as such.

      Since this is a potentially security related question, I decided to make this comment: Your posting history contains quite a few helpful posts. However, recently, you also appear to have been making a lot of posts that are clearly guesses, or in some cases, are frankly wrong, like I think this node was before you edited it. When I am guessing at a reply I try to say so clearly, or when I don't know the answer at all, I don't reply (e.g. you don't see me replying to Tk questions). My suggestion is that it would be better if you focus on quality instead of quantity.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11120760]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-26 07:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found