http://qs321.pair.com?node_id=414862


in reply to Re: Left-associative binary operators in Parse::RecDescent
in thread Left-associative binary operators in Parse::RecDescent

The good news: I applied your technique and it seems to work! The new grammar will parse everything the old one did and the new syntax too.

The bad news: it's incredibly slow! The test suite which used to run in a few seconds now takes so long that I have to kill it on the complex grammar test.

Here's the grammar I'm using now:

expression : logical_expr /^\$/ { \$return = \$item[1]; } logical_expr : comparison_expr logical_op logical_expr { [ \$item[2][0], \$item[2][1], \$item[1], \$item[3] ] +} | comparison_expr { \$item[1] } comparison_expr : math_expr comparison_op comparison_expr { [ \$item[2][0], \$item[2][1], \$item[1], \$item[3] + ] } | math_expr { \$item[1] } math_expr : paren_expr math_op math_expr { [ \$item[2][0], \$item[2][1], \$item[1], \$item[3] + ] } | paren_expr { \$item[1] } paren_expr : '(' logical_expr ')' { \$item[2] } | atom logical_op : /\\|\\||or|&&|and/ { [ ${\BIN_OP}, \$item[1] ] } comparison_op : /le|ge|eq|ne|lt|gt/ { [ ${\BIN_OP}, \$item[1] ] } | />=?|<=?|!=|==/ { [ ${\BIN_OP}, \$item[1] ] } math_op : /[-+*\\/\%]/ { [ ${\BIN_OP}, \$item[1] ] } atom : function_call | var | literal function_call : function_name '(' args ')' { [ ${\FUNCTION_CALL}, \$item[1], \$item[3] ] } | function_name ...'(' logical_expr { [ ${\FUNCTION_CALL}, \$item[1], [ \$item[3] ] ] } | function_name '(' ')' { [ ${\FUNCTION_CALL}, \$item[1] ] } function_name : /[A-Za-z_][A-Za-z0-9_]*/ { \$item[1] } args : <leftop: logical_expr ',' logical_expr> var : /[A-Za-z_][A-Za-z0-9_]*/ { \\\$item[1] } literal : /-?\\d*\\.\\d+/ { \$item[1] } | /-?\\d+/ { \$item[1] } | <perl_quotelike> { \$item[1][2] }

Any ideas what might be going wrong?

-sam

Replies are listed 'Best First'.
Re^3: Left-associative binary operators in Parse::RecDescent
by jryan (Vicar) on Dec 14, 2004 at 23:10 UTC

    Yeah, look at all of that crazy recursion! I was trying to steer you away from that in my other reply, but I had this freaking meeting to go to, so I guess I wasn't very clear in my explantion because I was in a rush. Apologies for that; I'll be more thorough here.

    Like I said in my other reply, a binop is essentially a string of expressions at the same precedence that are executed in sequence. (Earlier, you only had 1 level of precedence, so you could get away with just changing one rule.) The classic recursive/BNF way to do this is with a tail/right-recursive call back to the original rule. Unfortunately, recursion is really slow, especially when something has to back-track. However, tail-recursion can also be implemented as a loop, but only if the grammar engine has the capabilities.

    Luckily, PRD does infact have these capabilities via leftop, (s), and (s?). So, if we take a rule like:

    logical_expr : comparison_expr logical_op logical_expr | comparison_expr

    We can turn this into the loop form:

    logical_expr : comparison_expr mult_logical_expr(s?) mult_logical_expr: logical_op comparison_expr # or: logical_expr : comparison_expr (logical_op comparison_expr)(s?)

    And so, we continue down the precedence ladder, until we get to paren_expr, which should really be called something like "single_expr". paren_expr then needs to recurse back up to the top of the precedence list like you have, as this is unavoidable. Of course, we can capture more precisely with PRD, and so with these modifications, we come up with:

    logical_expr : comparison_expr (logical_op comparison_expr { [ \@item[1..2] ] })(s?) { [ \$item[1], map { \@ \$_ } \@{\$item[2]} ] } comparison_expr : math_expr (comparison_op math_expr { [ \@item[1..2] ] })(s?) { [ \$item[1], map { \@ \$_ } \@{\$item[2]} ] } math_expr : paren_expr (math_op paren_expr { [ \@item[1..2] ] })(s?) { [ \$item[1], map { \@ \$_ } \@{\$item[2]} ] }

    And, if you test it, you'll find it runs very fast now.

    A couple more notes:

    1. It really is a lot easier to write (and especially write!) PRD grammars if you use a single-quoted heredoc. You won't have to backslash anything that you don't need to.
    2. PRD automatically returns the last item matched in a production, so you can get rid of most of the { \$item[1] } productions.
    3. You might want to think about using an auto-action (or even just regular actions) that will differente the result of each rule into a seperate dynamically-generated class. That will make it much easier to walk the resulting tree, especially if you have several levels of paren-nesting. For instance, something like:
      $::RD_AUTOACTION = q { bless [@item[1..$#item]], "$item[0]_node" };
      will bless the resulting match array into a class that matches the rule name that was matched! The resulting object returned by the $parser->startrule will be a very nice tree that you can walk very easily with the Visitor pattern.