http://qs321.pair.com?node_id=414823

samtregar has asked for the wisdom of the Perl Monks concerning the following question:

Hello all. I'm working on a hairy parsing problem in HTML::Template::Expr and I'm not making much progress. In case you're not familiar, HTML::Template::Expr is an add-on to HTML::Template which adds basic expression support to the templating language. Stuff like:

<tmpl_if expr="color eq 'blue'">BLUE</tmpl_if> <tmpl_var expr="my_func(color, 'foo')"> <tmpl_if expr="(color eq 'blue') || (color eq 'red')">

All the above expressions (and a lot more) work just fine. I parse them with a Parse::RecDescent grammar which produces a tree. Executing the tree is a piece of cake.

Now I want to support:

<tmpl_if expr="(color eq 'blue') || (color eq 'red') || (color eq 'b +lack')">

But that won't parse with my grammar. The way I have it setup each || requires exactly two items and must be enclosed in parens unless it's on the outer-most scope. The outer-most scope is special because my code will add an enclosing () if none is present. To do three conditions this is required:

<tmpl_if expr="((color eq 'blue') || (color eq 'red')) || (color eq +'black')">

I've been hacking on the grammar all morning but nothing's working. I'm sure it's got something to do with leftop but what exactly is beyond me. If anyone could help I'd be greatly appreciative!

Here's the grammar, which you can also find in the latest version of the module:

expression : subexpression /^\$/ { \$return = \$item[1]; } subexpression : binary_op { \$item[1] } | function_call { \$item[1] } | var { \$item[1] } | literal { \$item[1] } | '(' subexpression ')' { \$item[2] } | <error> binary_op : '(' subexpression op subexpression ')' { [ \$item[3][0], \$item[3][1], \$item[2], \$item[4] ] + } op : />=?|<=?|!=|==/ { [ ${\BIN_OP}, \$item[1] ] } | /le|ge|eq|ne|lt|gt/ { [ ${\BIN_OP}, \$item[1] ] } | /\\|\\||or|&&|and/ { [ ${\BIN_OP}, \$item[1] ] } | /[-+*\\/\%]/ { [ ${\BIN_OP}, \$item[1] ] } function_call : function_name '(' args ')' { [ ${\FUNCTION_CALL}, \$item[1], \$item[3] ] } | function_name ...'(' subexpression { [ ${\FUNCTION_CALL}, \$item[1], [ \$item[3] ] ] } | function_name '(' ')' { [ ${\FUNCTION_CALL}, \$item[1] ] } function_name : /[A-Za-z_][A-Za-z0-9_]*/ { \$item[1] } args : <leftop: subexpression ',' subexpression> var : /[A-Za-z_][A-Za-z0-9_]*/ { \\\$item[1] } literal : /-?\\d*\\.\\d+/ { \$item[1] } | /-?\\d+/ { \$item[1] } | <perl_quotelike> { \$item[1][2] }

Thanks!
-sam

PS: I'd also be interested in a way to remove the need for parens entirely ("n == 10 || n == 50") which seems like it could be the same problem...

Replies are listed 'Best First'.
Re: Left-associative binary operators in Parse::RecDescent
by blokhead (Monsignor) on Dec 14, 2004 at 20:01 UTC
    What you want is grammar stratification to give you precedence levels (as opposed to relying on rule matching order as you do now). Put the lowest precedence stuff at the top/outermost level of the grammar, and the highest-precendence stuff at the bottom/innermost level. This is one standard technique for writing grammars.

    The basic structure of stratified rules is like this (at least for binary operations):

    ## for right-associative ops: lower_prec_expr : higher_prec_expr lower_prec_op lower_prec_expr | higher_prec expr ## for left-associative ops: ## warning: written this way, it is left-recursive. lower_prec_expr : lower_prec_expr lower_prec_op higher_prec_expr | higher_prec expr
    You will have many layers of this type of thing (one for each level of precedence). You can think of it as the grammar matching the outermost expressions first (which have the lowest precedence, so are conceptually applied last), and then filling in the higher-precedence details later.

    Here's an example for your purposes:

    use Parse::RecDescent; use Data::Dumper; my $grammar = q{ startrule: logical_expr logical_expr: comparison_expr logical_op logical_expr { [@item[1,2,3 +]] } | comparison_expr logical_op: /\\|\\||or|&&|and/ comparison_expr: paren_expr comparison_op comparison_expr { [@item[1 +,2,3]] } | paren_expr comparison_op: />=?|<=?|!=|==|le|ge|eq|ne|lt|gt/ paren_expr: '(' logical_expr ')' { $item[2] } | atom atom: /[A-Za-z_][A-Za-z0-9_]*/ }; my $parser = new Parse::RecDescent $grammar; for (<DATA>) { print $_; print Dumper $parser->startrule($_); print "============\n"; } __DATA__ foo == bar || baz < bar foo == bar || baz < bar || foo && bar (foo || bar) || baz
    This seems to output the kind of thing you want. Notice how paren_expr must refer all the way back to the lowest-level precedence logical_expr!

    Postscript: Refer to the Dragon Book or a similar parsing reference for more on stratification. (could have sworn the Dragon Book mentioned this, but it's not in the index)

    Update: Limbic~Region noticed that I had swapped left & right associativity in my examples. Now fixed.

    blokhead

      The good news: I applied your technique and it seems to work! The new grammar will parse everything the old one did and the new syntax too.

      The bad news: it's incredibly slow! The test suite which used to run in a few seconds now takes so long that I have to kill it on the complex grammar test.

      Here's the grammar I'm using now:

        Yeah, look at all of that crazy recursion! I was trying to steer you away from that in my other reply, but I had this freaking meeting to go to, so I guess I wasn't very clear in my explantion because I was in a rush. Apologies for that; I'll be more thorough here.

        Like I said in my other reply, a binop is essentially a string of expressions at the same precedence that are executed in sequence. (Earlier, you only had 1 level of precedence, so you could get away with just changing one rule.) The classic recursive/BNF way to do this is with a tail/right-recursive call back to the original rule. Unfortunately, recursion is really slow, especially when something has to back-track. However, tail-recursion can also be implemented as a loop, but only if the grammar engine has the capabilities.

        Luckily, PRD does infact have these capabilities via leftop, (s), and (s?). So, if we take a rule like:

        logical_expr : comparison_expr logical_op logical_expr | comparison_expr

        We can turn this into the loop form:

        logical_expr : comparison_expr mult_logical_expr(s?) mult_logical_expr: logical_op comparison_expr # or: logical_expr : comparison_expr (logical_op comparison_expr)(s?)

        And so, we continue down the precedence ladder, until we get to paren_expr, which should really be called something like "single_expr". paren_expr then needs to recurse back up to the top of the precedence list like you have, as this is unavoidable. Of course, we can capture more precisely with PRD, and so with these modifications, we come up with:

        logical_expr : comparison_expr (logical_op comparison_expr { [ \@item[1..2] ] })(s?) { [ \$item[1], map { \@ \$_ } \@{\$item[2]} ] } comparison_expr : math_expr (comparison_op math_expr { [ \@item[1..2] ] })(s?) { [ \$item[1], map { \@ \$_ } \@{\$item[2]} ] } math_expr : paren_expr (math_op paren_expr { [ \@item[1..2] ] })(s?) { [ \$item[1], map { \@ \$_ } \@{\$item[2]} ] }

        And, if you test it, you'll find it runs very fast now.

        A couple more notes:

        1. It really is a lot easier to write (and especially write!) PRD grammars if you use a single-quoted heredoc. You won't have to backslash anything that you don't need to.
        2. PRD automatically returns the last item matched in a production, so you can get rid of most of the { \$item[1] } productions.
        3. You might want to think about using an auto-action (or even just regular actions) that will differente the result of each rule into a seperate dynamically-generated class. That will make it much easier to walk the resulting tree, especially if you have several levels of paren-nesting. For instance, something like:
          $::RD_AUTOACTION = q { bless [@item[1..$#item]], "$item[0]_node" };
          will bless the resulting match array into a class that matches the rule name that was matched! The resulting object returned by the $parser->startrule will be a very nice tree that you can walk very easily with the Visitor pattern.
      Thanks for the help! I'll give this a try and see if it does the trick.

      -sam

Re: Left-associative binary operators in Parse::RecDescent
by jryan (Vicar) on Dec 14, 2004 at 20:26 UTC

    Precedence discussions aside (it didn't seem like you need precedence levels here), your problem is that you're only trying to match two things in a binary_op. Binary ops are more like "chains of one or more expressions of equal precedence", and since you only seem to have one level of precedence here, we can just do a loop:

    binary_op : '(' subexpression (op subexpression {[\@item[1..2]]} ) +(s?) ')' { [ \$item[2], map { \@\$_ } \@{\$item[3]} ] }

    If you want to be able to have an alternative that doesn't need parens, just mirror the same rule without parens. Let me know if you still have trouble...

    P.S.: When writing PRD grammars, its useful to use a non-interpolating heredoc (e.g. my $grammar = << '__ENDG__') so you don't have to backslash anything. :)

    binary_op : '(' subexpression (op subexpression {[@item[1..2]]})(s +?) ')' { [ $item[2], map { @$_ } @{$item[3]} ] }

      Thanks for the help! I've tried just dropping the '(' and ')' but then I get errors from PRD about the grammar being left-associative.

      -sam

        Yeah, that's a good point. You'll need to get rid of the left-recursion. To do that, you'd need to get rid of binary_op and subexpression in your subexpression rule. Next, you'll need to factor out all of the paren stuff into a single rule, and then use that rule within the binary_op rule instead of subexpression. Here's an example:

        paren: '(' binary_op ')' { \$item [2] } # parens belong here, and o +nly here! | subexpression subexpression: function_call | var | literal | <error> binary_op : paren (op paren { [ \@item[1..2] ] })(s?) # any +parenned expression will sink down here { [ \$item[1], map { \@\$_ } \@{\$item[2]} ] }