Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

"eval"ing a hash without eval

by Ovid (Cardinal)
on Dec 29, 2005 at 01:41 UTC ( [id://519695]=perlquestion: print w/replies, xml ) Need Help??

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

In trying to run some code with taint checking, I found the following snippet:

local $/; my %conf = eval <CONF>;

Needless to say, that fails in taint mode. I had three choices. I could either spend two or three days ripping out all of the configuration code which writes to that file and replace it with a standard configuration set up, find a module on the CPAN which implements this or I could reinvent the wheel.

As you can see from the code, whatever is in CONF should eval to hash. The only thing I found which does that directly was Parse::PerlConfig. Unfortunately, that uses eval. Given the sheer number of config files to wade through, I suspect there might be something which handles this properly, but I couldn't find it. The config file is in a simple format like this:

#!/usr/bin/perl # Which store are we using? store => { class => 'Store::DB::SQLite', }, # Configuration for the PostgreSQL data store. # pg => { # } # Where is the data store? sqlite => { file => 't/data/store.db', },

Basically, we have a hash of hashes. Each contained hash is guaranteed to be a simple list of key/value pairs. As a quick hack, I through together this:

use Regexp::Common; { my $shebang_re = qr/#!\S*/; my $bareword_re = qr/[[:word:]]+/; my $quoted_re = $RE{quoted}; my $comma_re = qr/(?:=>|,)/; my $n_re = qr/\s*(?:\n|\r)?\s*/; my $pair_re = qr/\s*$bareword_re\s*$comma_re\s*(?:$bareword_re|$qu +oted_re)/; my $hash_body_re = qr/\s*{\s*(?:$pair_re\s*$comma_re\s*)*\s*(?:$pair_re\s*$comma_re +?\s*)\s*}\s*/; my $comment_re = qr/(?:^\s*#.*$n_re)*/m; my $hash_re = qr/\s*$comment_re?\s*$bareword_re\s*$comma_re\s*$hash_body_re\s* +/; my $hashes_re = qr/\s*(?:$hash_re\s*$comma_re)*\s*(?:$hash_re\s*$comma_re?\s*)/; my $conf_re = qr/$shebang_re?\s*$hashes_re\s*/; sub _untaint_config { my $_conf = shift; my ($conf) = $_conf =~ /^($conf_re)$/sm; return $conf; } # testing hooks if ( $ENV{HARNESS_ACTIVE} ) { *_comma_re = sub { $comma_re }; *_comment_re = sub { $comment_re }; *_pair_re = sub { $pair_re }; *_hash_body_re = sub { $hash_body_re }; *_conf_re = sub { $conf_re }; } }

That's phenomenally ugly, but it works. It lets me do this:

local $/; my %conf = eval _untaint_config(<CONF>);

It also tightly restricts the use of eval and since this is deliberately Perl-like without actually being Perl, I can't see how someone would accidentally slip in naughty data. The problem is, that is so ugly that I don't want it in our code and I would be much happier if there's already a module out there which would handle this more gracefully. Further, while it works and passes my tests, I'm sure it has bugs. I want: no eval and a very restricted syntax. Is that out there?

Side note: yeah, I should probably convert that to Regexp::Assemble but it's surprisingly fast.

Cheers,
Ovid

New address of my CGI Course.

Replies are listed 'Best First'.
Re: "eval"ing a hash without eval
by Aristotle (Chancellor) on Dec 29, 2005 at 03:04 UTC

      Hmm, if I have to go that route, I may as well break out Hop::Lexer and Hop::Parser. That would be much faster. Actually, I could do it with just the lexer and it would be pretty easy.

      Cheers,
      Ovid

      New address of my CGI Course.

Re: "eval"ing a hash without eval
by sgifford (Prior) on Dec 29, 2005 at 02:37 UTC
    In trying to run some code with taint checking, I found the following snippet:
    local $/; my %conf = eval <CONF>;

    Needless to say, that fails in taint mode.

    How about if you add:
    use IO::Handle; CONF->untaint;
    This snippet, for example, seems to work as expected:
    #!/usr/bin/perl -Tw use strict; use IO::Handle; use Data::Dumper; open(CONF,"< t41.conf") or die "couldn't open conf: $!\n"; CONF->untaint; my %conf; { local $/; undef $/; %conf = eval <CONF>; } print Dumper \%conf;

      Yes, by my method is suitably paranoid and I don't have to worry about unsafe data getting in there. I should take a tip from Aristotle and go ahead and lex things properly. I'll avoid the entire eval scenario altogether.

      Cheers,
      Ovid

      New address of my CGI Course.

Re: "eval"ing a hash without eval
by BrowserUk (Patriarch) on Dec 29, 2005 at 05:36 UTC

    Please don't think I am "having a go", because I am not, but I don't get this. And I don't get it at multiple levels.

    1. You have a working solution. You cannot find an off-the-shelf CPAN solution. But you are looking to replace your working solution with someone else's (external to your organisation) code?

      The point of CPAN, and code reuse in general, is to leverage existing solutions. You have an existing solution! Why replace it?

      I've seen the claim made that you avoid maintenance by using CPAN, because someone else maintains them, but that makes no sense. Maintenance is only required if something breaks; or changes.

      • In the former case, if the code is in-house, if no one changes it, nothing will break.
      • Unless something changes in the calling code, in which case it would break the CPAN code as well.

      With the code in house, you can make the changes easily and quickly.

      With a CPAN module you would have to

      1. negotiate with a third party to agree a change is needed;
      2. agree what that change should be;
      3. await their pleasure in making and testing and shipping that change.

      And you still have to test their changes are compatible with your code-base and don't break anything else.

      And you create a dependency, and subject your code to the vulnerability that the author may change his module in a way that breaks your code in some future release.

      Where is the saving, ROI, code-reuse that comes from discarding your in-house working solution, for a speculative, third party equivalent?

    2. This "config file" presumably lives in the same place as the rest of your code-base?

      But,

      1. use boils down to require.
      2. require boils down to do.
      3. do boils down to eval.

      In other words, every piece of Perl code you run, gets eval'd.

      Why is is good enough for the rest of your code-base, and not for this piffling little config file?

      On that basis, you will need to write your own Perl Interpreter, (in some language other than Perl!), so that you can untaint the rest of your code-base?

    3. If you really do have to untaint the contents of this config file, wouldn't it be simpler and quicker to change the format of the file so that it didn't mimic executable code?

      You could then use one of the dozens of existing config modules that doesn't use eval.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      These are very easy questions to answer.

      Why replace my working code?
      Because I hacked something together quickly and it's probably not very robust. If there is something out there better tested, I want it. If I can't get the author to repond to problems, forking is trivial. (But see my comment below)
      Every piece of code you run gets eval'd.
      I know where my code comes from but I can't guarantee the source of that config file. It's location is set by an environment variable and I can't guarantee someone won't hand edit that file. That's a whopping huge security hole.
      Wouldn't it be faster to change the file format?
      No. It would take far longer. That config file is autogenerated. As mentioned in my post, it would take me two or three days (I hope) to rip out everything which writes to that file and replace it. Instead, I hacked a solution in a couple of hours.

      I will agree though that too much reliance on external modules is problematic. For bigger things we don't have the time to do, maybe that's OK. For smaller things, maybe forking or cribbing ideas is a better bet.

      Cheers,
      Ovid

      New address of my CGI Course.

        I know where my code comes from but I can't guarantee the source of that config file. It's location is set by an environment variable and I can't guarantee someone won't hand edit that file. That's a whopping huge security hole.
        To clarify: the code runs with some sort of special privileges, which allow a user to do things they wouldn't otherwise be able to do, and also gets its configuration from an environment variable that the user has control over? And the user can perform inappropriate actions by putting code into the config file, but not by making any other changes to the file?
Re: "eval"ing a hash without eval
by ambrus (Abbot) on Dec 29, 2005 at 11:08 UTC

    You are right, it has bugs. Perl is not that easy to parse.

    This regexp, for example, allows the following string

    #!perl qq, {z => "${warn qq/hello world/}" },
    which, when evalled, prints a warning message. You can imagine that I could put more unsecure code in there than that.

    My advice is that you don't try to evaluate untrusted perl (or shell or ruby) code, as you just can't launder it clean by parsing it.

    Interpretting the code you're trying to parse yourself would be a much better idea. I'm as surprised as you there's no module for that.

      Damn. You're right. I'm going to have to go the full lex/parse route to avoid this.

      Cheers,
      Ovid

      New address of my CGI Course.

Re: "eval"ing a hash without eval
by salva (Canon) on Dec 29, 2005 at 13:10 UTC
    how about using Safe, usually, it is not secure enought, but I believe in that case it will because populating a hash require very few opcodes to be allowed and no interaction with other parts of your program.
Re: "eval"ing a hash without eval
by qq (Hermit) on Dec 29, 2005 at 14:40 UTC

    Would it be that hard to convert to a sane setup? Make a little script that evals each config file as shown, then YAML::Dump's it out to another replacement config file...

    The calling code would (probably) be easy to replace. This would lose any comments. But you mentioned the the config file is written to in multiple places, and I can't see how the comments would be getting added anyway in that case. I have a feeling I'm missing the complexity here...

    $ cat hash.conf foo => 'bar', baz => 'quux', $ perl -MYAML -0 -e '%h=eval<>; YAML::DumpFile( "$ARGV\.yml", \%h)' ha +sh.conf $ cat hash.conf.yml --- baz: quux foo: bar $ perl -MYAML -MData::Dumper -e '$h = YAML::LoadFile( shift() ); print + Dumper $h' hash.conf.yml $VAR1 = { 'baz' => 'quux', 'foo' => 'bar' };

      You write:

      Would it be that hard to convert to a sane setup?

      Ovid wrote:

      I could either spend two or three days ripping out all of the configuration code which writes to that file and replace it with a standard configuration set up, find a module on the CPAN which implements this or I could reinvent the wheel.

      And then Ovid wrote again:

      That config file is autogenerated. As mentioned in my post, it would take me two or three days (I hope) to rip out everything which writes to that file and replace it. Instead, I hacked a solution in a couple of hours.

      Your suggestion (or something very close) has already been made twice by others after it was preempted by the original node, and was then refuted once more in replies.

      Makeshifts last the longest.

        I had somehow missed the reply to BrowserUk when I posted. I did write I have a feeling I'm missing the complexity here. And it did occur to me that Ovid was unlikely to have overlooked any simple solutions.

        After seeing the reply to BrowserUk I considered updating my original. But I could think of nothing actually useful to add. So instead everybody gets your entirely accurate but somewhat snotty reply, and my waste-of-space defensive rebuttal. Luckily we are all professional enough not to take offense ;)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://519695]
Approved by sgifford
Front-paged by Courage
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-03-29 06:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found