Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: How to parse a limited grammar

by TedYoung (Deacon)
on Jan 10, 2008 at 16:35 UTC ( [id://661647]=note: print w/replies, xml ) Need Help??


in reply to How to parse a limited grammar

Well, if you are confident that the users of your software are trustworthy, you could simply apply one or more s///'s to the value and eval it:

$value =~ s/\./->/g; ... my $generator = eval "sub { $value }"; die if $@; for (@rows) { ... print $generator->(); ... }

This gives you tremendous power with a minimum amount of work. Though, if you don't completely trust your users, it may give you too much power. You could limit that with Safe, but that has been shown not to be a complete sandbox (it makes it much harder for the user to inadvertently screw up things, but if they have their heart set on it, they can exploit certain flaws).

HOP::Parser is great but very, very slow. You only need it if you are parsing streams. Here you have very short strings and, if the above solution won't work for you, you can use traditional lexing semantics to build up a list of tokens. Your grammar is probably simple enough to simply iterator over those tokens and generate valid perl to be eval'ed (or derive the value as you go).

# untested, just pseudo code my @tokens; local $_ = $value; while ( !/\G$/gc ) { # while not at end of string push @tokens, /\G\./gc ? [ 'DOT', '.' ] : /\G(\$\w+)/gc ? [ 'VAR', $1 ] : /\G(\w+)\((.*?)\)/gc ? [ 'METHOD', $1, [ split /\s*,\s*/, $2 ] : /\G(\w+)/gc ? [ 'METHOD', $1, [] ] : ... last }

Note that if you want proper nesting of () in method calls, you will need to use a better regex. Continue to add tokenizers for operators (|, &) etc. When done, just iterate over the list of tokens and generate either the value or if necessary generate trusted perl code and eval it.

update:

# again, untested, just for demonstration my $code = ''; for (@tokens) { my ($type, $source, @params) = @$_; if ($type eq 'VAR') { $code .= $source; } elsif ($type eq 'DOT') { $code .= '->'; } elsif ($type eq 'METHOD') { .... } my $generator = eval "sub { $code }"; die if $@; for (@rows) { ... print $generator->(); ... }

update 2:

If you choose either eval option above, you can expose variables to your value code like this:

my $generator = eval "sub { my (\$order, \$order_line) = @_; $code }"; ... $generator->($row->{order}, $row->{order_line});

This exposes variables to your value code without the need of globals and namespaces.

Ted Young

It is almost impossible for me to read contemporary mathematicians who, instead of saying "Petya washed his hands," write simply: "There is a t1 < 0 such that the image of t1 under the natural mapping t1 -> Petya(t1) belongs to the set of dirty hands, and a t2, t1 < t2 <= 0, such that the image of t2 under the above-mentioned mapping belongs to the complement of the set defined in the preceding sentence."
The Russian mathematician V. I. Arnol'd

Replies are listed 'Best First'.
Re^2: How to parse a limited grammar
by clinton (Priest) on Jan 10, 2008 at 16:55 UTC

    Thanks for the reply Ted

    Yes, I had considered just eval'ing the code - as you surmised, this will be added by trusted users only. Paranoia, an aversion to string evals, and a desire to learn about parsing led me to this post. But evals may yet be the way to go.

    HOP::Parser is great but very, very slow.

    That is what I feared - good to know

    Probably my main reason for looking at a proper parser solution was to be able to handle nested expressions and logic branches (terminology?). I didn't want to waste time going down one road if it was obvious to everybody else that I shouldn't bother.

    Given what you've said, I'm going to give it a go, and just see where it takes me.

    thanks again

      Good. But now I feel bad. I want to qualify my statement about HOP::Lexer being slow (which is a relative statement). It is much faster than some alternatives but is much slower than lexing by hand (as shown above). I found that, in my grammars, lexing by hand was 10 times faster. That is not because HOP::Lexer is bad, but because it has to contend with streams, a feature that makes it much more powerful then lexing by hand, but completely unnecessary for what we are doing here.

      HOP::Lexer and High Order Perl are good products!

      Ted Young

      It is almost impossible for me to read contemporary mathematicians who, instead of saying "Petya washed his hands," write simply: "There is a t1 < 0 such that the image of t1 under the natural mapping t1 -> Petya(t1) belongs to the set of dirty hands, and a t2, t1 < t2 <= 0, such that the image of t2 under the above-mentioned mapping belongs to the complement of the set defined in the preceding sentence."
      The Russian mathematician V. I. Arnol'd

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://661647]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-03-29 08:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found