Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Order of Precedence in Parse::RecDescent grammar

by suaveant (Parson)
on Jun 01, 2005 at 15:20 UTC ( [id://462517]=perlquestion: print w/replies, xml ) Need Help??

suaveant has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to put together a Parse::RecDescent grammar that will read in an expression and build a parse tree for it. So far things have gone pretty well, but the order of precedence is not working for me. My test case shows multiplication having a higher precedence than modulus, which is wrong... Here is the test script
use strict; use Parse::RecDescent; use Data::Dumper; my $grammar = q` startrule: expression /^\Z/ { $item[1]; } # you can also put comments into the grammar # such as to explain that /^\Z/ means 'match the end of the string +' expression: operand operator expression { [@item[1..3]] } | operand { $item[1] } number: /\d+/ { $item[1] } paren_expression: '(' expression ')' { $item[2] } operand: paren_expression { $item[1] }| number { $item[1] } operator: '%' { [@item] } | /[*\/]/ { [@item] } | /[+-]/ { [@item] } | '=' { [@item] } `; my $parser = new Parse::RecDescent( $grammar ) or die "Compile error\n +"; my $st = '3%2*4'; print Dumper($parser->startrule($st)); ##### Printout $VAR1 = [ '3', [ 'operator', '%' ], [ '2', [ 'operator', '*' ], '4' ] ];
evaluating this would give the equivalent of 3%(2*4) which is wrong. I am new to P::RD so I am sure I have just done something stupid along the way... any enlightenment would be much appreciated.

                - Ant
                - Some of my best work - (1 2 3)

Replies are listed 'Best First'.
Re: Order of Precedence in Parse::RecDescent grammar
by ikegami (Patriarch) on Jun 01, 2005 at 18:07 UTC

    Ideally, you'd want:

    expr : bin_op_2 { $item[1] } bin_op_2 : bin_op_2 /[+-]/ bin_op_1 { [ @item[2, 1, 3] ] } | bin_op_1 { $item[1] } bin_op_1 : bin_op_1 /[*\\\/%]/ term { [ @item[2, 1, 3] ] } | term { $item[1] }

    but the module doesn't allow left-recursive grammars. Here's how left-recursion is removed:

    bin_op_2 : bin_op_1 bin_op_2_(s?) { treeify($item[1], map { @$_ } @{$item[2]}); } bin_op_1 : term bin_op_1_(s?) { treeify($item[1], map { @$_ } @{$item[2]}); } bin_op_2_ : /[+-]/ bin_op_1 { [ $item[1], $item[2] ] } bin_op_1_ : /[*\\\/%]/ term { [ $item[1], $item[2] ] }

    Since it returns a list, we need to spend extra effort making it into a tree again. treeify is defined as:

    sub treeify { my $t = shift; $t = [ shift, $t, shift ] while @_; return $t; }

    Parse::RecDescent provides a shortcut in the form of <leftop>:

    # Lowest precedence. bin_op_2 : <leftop: bin_op_1 SUM bin_op_1 > { treeify(@{$item[1]}); } bin_op_1 : <leftop: term PROD term > { treeify(@{$item[1]}); } # Highest precedence. SUM : '+' | '-' PROD : '*' | '/' | '\\' | '%'

    Here's the final product (with a couple of operators added for demonstration purposes):

    You could use the following for assignment, but I think it's slower:

    sub treeify_r { my $t = pop; $t = [ pop, pop, $t ] while @_; return $t; } assign : <rightop: IDENT ASSIGN assign > { treeify_r(@{$item[1]}); } ASSIGN : '='

    Update: Changed
    expr : bin_op_5 { $item[1] }
    to
    expr : assign { $item[1] }

    Changed
    '\'
    to
    '\\'

      This seems to work... I am guessing it is basically the shortcut way to do what blokhead proposed... I was able to get it to return a little less depth in the tree, though it isn't quite perfect.

      Here is my grammar, for posterity's sake.

      my $grammar = <<'GRAMMAR' { use strict; use warnings; sub treeify { my $t = shift; # my($l,$r) = (shift,shift); while(@_) { my($l,$r) = (shift,shift); $t = [ 'op',$l, (ref $t eq 'ARRAY' && !$#$t ? @$t : $t), (ref + $r eq 'ARRAY' && !$#$r ? @$r : $r) ] } return $t; } } startrule: bin_op bin_op: comp_op # Lowest precendance. comp_op: <leftop: add_op COMP add_op> { treeify(@{$item[1]}); } add_op: <leftop: prod_op SUM prod_op > { treeify(@{$item[1]}); } prod_op : <leftop: mod_op PROD mod_op > { treeify(@{$item[1]}); } # Highest precendance. mod_op : <leftop: term MOD term > { treeify(@{$item[1]}); } SUM : '+' | '-' PROD : '*' | '/' MOD : '%' COMP : />=?|<=?|!=|==?|le|ge|eq|ne|lt|gt/ term: function | '(' bin_op ')' { $item[2] } | number | string if: /if/i '(' list_bin_op list_bin_op bin_op ')' { ['func',@item[1,3 +..5]] } concat: /concat/i '(' list_bin_op(s) bin_op ')' { ['func',$item[1],@ +{$item[3]},$item[4]] } left: /left/i '(' list_bin_op bin_op ')' { ['func',@item[1,3,4]] } right: /right/i '(' list_bin_op bin_op ')' { ['func',@item[1,3,4]] } ifnull: /ifnull/i '(' list_bin_op bin_op ')' { ['func',@item[1,3,4]] + } function: if | concat | left | right | ifnull number: /[+-]?\d+/ { $item[1] } string: { $_ = extract_quotelike($text); chop; substr($_,0,1,''); $_ +; } { [@item] } #$thisparser->startrule($item[2],1,$type) list_bin_op: bin_op ',' { $item[1] } GRAMMAR ;
      This is actually parsing MySQL-like statements, so there is a bit in there for some function definitions, seems to work, though. Thanks all!

                      - Ant
                      - Some of my best work - (1 2 3)

        You're giving % higher precedence than *. Are you sure that's correct? According to your grammar, 4*2%3 is equivalent to 4*(2%3) (8) rather than (4*2)%3 (0).

        The words op and func are probably redundant. Do you really need to know that something is a op rather than specifically an addition, substratction, multiplication, etc.?

        Does extract_quotelike remove the quotes? If so, you have a problem if someone create a string called op or func. That's why I had term in the tree. This is the only level you removed, and it hurts you.

        Update: Oops, I'm mistaken on that last one. If it's a ref, it's an op or func. It's it's not, it's a number or string. It's not very orthogonal to flattern terms, but it works for this grammar.

Re: Order of Precedence in Parse::RecDescent grammar
by blokhead (Monsignor) on Jun 01, 2005 at 16:04 UTC
    See Re: Left-associative binary operators in Parse::RecDescent for my preferred technique of doing precedence in grammars (stratification). You have one layer (stratum) for each precedence levels. Your grammar's starting rule is at the lowest precedence and the subrules increase in precedence. Finally, at the bottom level (of highest precedence), you have the rule for parenthesized expressions which recurses back to the top level (of lowest precedence). But this is unambiguous, because of the parentheses in this recursion rule.

    P::RD may have other bells & whistles to help with precedence, but this is the only way I can ever remember to do precedence correctly, and it works with any kind of parser (although you may have to factor out left-recursion in right-associative operators).

    Update: BTW, you can string together all operations of the same precedence level, which is probably more efficient. I.e, parse (a+b)*c*d*e*(f+g) into a shallow parse tree whose root has 5 children, instead of having a deeper parse tree whose nodes always have 2 children. However, for simplicity/uniformity it's often nice to always know that each node of the parse tree is a simple binary operation.

    blokhead

      So... away I went... and that seems to work, BUT!

      where when I parsed 4+1*2 before I got

      $VAR1 = [ '4', '+', [ '1', '*', '2' ] ];
      Now I get
      $VAR1 = [ [ [ [ '4' ] ], '+', [ [ '1' ], '*', [ '2' ] ] ] ];
      Ick... any good way to clean that up? It gets really nasty with more complex statements.
      Here is the new grammar.
      startrule: comparison_expr comparison_expr: additive_expr (comparison_op additive_expr { [ @item[1..2] ] })(s?) { [ $item[1], map { @ $_ } @{$item[2]} ] } comparison_op: />=?|<=?|!=|==?|le|ge|eq|ne|lt|gt/ additive_expr: multiplicative_expr (additive_op multiplicative_expr { [ @item[1..2] ] })(s +?) { [ $item[1], map { @ $_ } @{$item[2]} ] } additive_op: /[+-]/ multiplicative_expr: modulus_expr (multiplicative_op modulus_expr { [ @item[1..2] ] })(s? +) { [ $item[1], map { @ $_ } @{$item[2]} ] } multiplicative_op: /[*\/]/ modulus_expr: paren_expr ('%' paren_expr { [ @item[1..2] ] })(s?) { [ $item[1], map { @ $_ } @{$item[2]} ] } paren_expr: '(' comparison_expr ')' { $item[2] } | number number: /[+-]?\d+/ { $item[1] }

                      - Ant
                      - Some of my best work - (1 2 3)

        That's another reason why it's simpler to have just binary operations at each level (instead of a longer sequence of same-precedence operations). ;)

        Consider the generic outline:

        lower_prec_expr : higher_prec_expr lower_prec_op lower_prec_expr | higher_prec expr
        The problem is that you want your parse to give different semantic values for the different branches. If you take the 2nd branch, it's just a "pass-through" of the semantic value and you don't want to construct a new arrayref around it. If you take the first branch where there is actually an operation, you need to construct an arrayref with the appropriate things.

        You've combined both of these branches into one production in the grammar like this:

        lower_prec_expr : higher_prec_expr (lower_prec_op lower_prec_expr)(s?) { [ "put stuff in arrayref" ] }
        So now you're putting things inside an arrayref even in the "pass-through" case, where there is no operator at this point in the parse.

        To fix this, put a conditional in your semantic rule to distinguish these cases, i.e:

        lower_prec_expr : higher_prec_expr (lower_prec_op lower_prec_expr)(s?) { @{$item[2]} ? [ ... ] : $item[1] }
        That's kinda nasty. Alternatively, have two different productions explicitly in the grammar:
        lower_prec_expr : higher_prec_expr (lower_prec_op lower_prec_expr)(s) { [ ... ] } | higher_prec_expr
        I.e, if there *are* operators here (no question mark on the (s) quantifier), we build an arrayref for the semantic value with the first branch. Otherwise, the second branch has no semantic rule, so it's just a "pass-through" of the semantic value and does not add layers of arrayref.

        The latter is probably the best choice. Just change the (s?)'s to (s)'s and add the other alternation to the grammar.

        blokhead

Re: Order of Precedence in Parse::RecDescent grammar
by ambrus (Abbot) on Jun 01, 2005 at 17:40 UTC

    Here's my guess.

    You have this:

    expression: operand operator expression { [@item[1..3]] } |
    This way, 3%2*4 can not be parsed as (3%2)*4 because according to the grammar, the left side of the operator must be an operand. 3%2 is not an operand, because it's not explicitly parenthisized.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://462517]
Approved by kvale
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (10)
As of 2024-04-16 08:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found