Order of Precedence in Parse::RecDescent grammar

suaveant has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Order of Precedence in Parse::RecDescent grammar by ikegami (Patriarch) on Jun 01, 2005 at 18:07 UTC
Ideally, you'd want: `expr : bin_op_2 { $item[1] } bin_op_2 : bin_op_2 /[+-]/ bin_op_1 { [ @item[2, 1, 3] ] } \| bin_op_1 { $item[1] } bin_op_1 : bin_op_1 /[\\\/%]/ term { [ @item[2, 1, 3] ] } \| term { $item[1] }` [download] but the module doesn't allow left-recursive grammars. Here's how left-recursion is removed: `bin_op_2 : bin_op_1 bin_op_2_(s?) { treeify($item[1], map { @$_ } @{$item[2]}); } bin_op_1 : term bin_op_1_(s?) { treeify($item[1], map { @$_ } @{$item[2]}); } bin_op_2_ : /[+-]/ bin_op_1 { [ $item[1], $item[2] ] } bin_op_1_ : /[\\\/%]/ term { [ $item[1], $item[2] ] }` [download] Since it returns a list, we need to spend extra effort making it into a tree again. `treeify` is defined as: `sub treeify { my $t = shift; $t = [ shift, $t, shift ] while @_; return $t; }` [download] Parse::RecDescent provides a shortcut in the form of `<leftop>`: `# Lowest precedence. bin_op_2 : <leftop: bin_op_1 SUM bin_op_1 > { treeify(@{$item[1]}); } bin_op_1 : <leftop: term PROD term > { treeify(@{$item[1]}); } # Highest precedence. SUM : '+' \| '-' PROD : '' \| '/' \| '\\' \| '%'` [download] Here's the final product (with a couple of operators added for demonstration purposes): Read more... (2 kB) You could use the following for assignment, but I think it's slower: `sub treeify_r { my $t = pop; $t = [ pop, pop, $t ] while @_; return $t; } assign : <rightop: IDENT ASSIGN assign > { treeify_r(@{$item[1]}); } ASSIGN : '='` [download] Update*: Changed `expr : bin_op_5 { $item[1] }` to `expr : assign { $item[1] }` Changed `'\'` to `'\\'`	[reply] [d/l] [select]
Re^2: Order of Precedence in Parse::RecDescent grammar by suaveant (Parson) on Jun 02, 2005 at 14:26 UTC
This seems to work... I am guessing it is basically the shortcut way to do what blokhead proposed... I was able to get it to return a little less depth in the tree, though it isn't quite perfect. Here is my grammar, for posterity's sake. my $grammar = <<'GRAMMAR' { use strict; use warnings; sub treeify { my $t = shift; # my($l,$r) = (shift,shift); while(@_) { my($l,$r) = (shift,shift); $t = [ 'op',$l, (ref $t eq 'ARRAY' && !$#$t ? @$t : $t), (ref + $r eq 'ARRAY' && !$#$r ? @$r : $r) ] } return $t; } } startrule: bin_op bin_op: comp_op # Lowest precendance. comp_op: <leftop: add_op COMP add_op> { treeify(@{$item[1]}); } add_op: <leftop: prod_op SUM prod_op > { treeify(@{$item[1]}); } prod_op : <leftop: mod_op PROD mod_op > { treeify(@{$item[1]}); } # Highest precendance. mod_op : <leftop: term MOD term > { treeify(@{$item[1]}); } SUM : '+' \| '-' PROD : '*' \| '/' MOD : '%' COMP : />=?\|<=?\|!=\|==?\|le\|ge\|eq\|ne\|lt\|gt/ term: function \| '(' bin_op ')' { $item[2] } \| number \| string if: /if/i '(' list_bin_op list_bin_op bin_op ')' { ['func',@item[1,3 +..5]] } concat: /concat/i '(' list_bin_op(s) bin_op ')' { ['func',$item[1],@ +{$item[3]},$item[4]] } left: /left/i '(' list_bin_op bin_op ')' { ['func',@item[1,3,4]] } right: /right/i '(' list_bin_op bin_op ')' { ['func',@item[1,3,4]] } ifnull: /ifnull/i '(' list_bin_op bin_op ')' { ['func',@item[1,3,4]] + } function: if \| concat \| left \| right \| ifnull number: /[+-]?\d+/ { $item[1] } string: { $_ = extract_quotelike($text); chop; substr($_,0,1,''); $_ +; } { [@item] } #$thisparser->startrule($item[2],1,$type) list_bin_op: bin_op ',' { $item[1] } GRAMMAR ; [download] This is actually parsing MySQL-like statements, so there is a bit in there for some function definitions, seems to work, though. Thanks all! - Ant - Some of my best work - (1 2 3)	[reply] [d/l]
Re^3: Order of Precedence in Parse::RecDescent grammar by ikegami (Patriarch) on Jun 02, 2005 at 17:00 UTC
You're giving `%` higher precedence than ``. Are you sure that's correct? According to your grammar, `42%3` is equivalent to `4(2%3)` (8) rather than `(42)%3` (0). The words `op` and `func` are probably redundant. Do you really need to know that something is a op rather than specifically an addition, substratction, multiplication, etc.? Does `extract_quotelike` remove the quotes? If so, you have a problem if someone create a string called `op` or `func`. That's why I had `term` in the tree. This is the only level you removed, and it hurts you. Update: Oops, I'm mistaken on that last one. If it's a ref, it's an op or func. It's it's not, it's a number or string. It's not very orthogonal to flattern terms, but it works for this grammar.	[reply] [d/l] [select]
Re^4: Order of Precedence in Parse::RecDescent grammar by suaveant (Parson) on Jun 02, 2005 at 19:54 UTC
Re: Order of Precedence in Parse::RecDescent grammar by blokhead (Monsignor) on Jun 01, 2005 at 16:04 UTC
See Re: Left-associative binary operators in Parse::RecDescent for my preferred technique of doing precedence in grammars (stratification). You have one layer (stratum) for each precedence levels. Your grammar's starting rule is at the lowest precedence and the subrules increase in precedence. Finally, at the bottom level (of highest precedence), you have the rule for parenthesized expressions which recurses back to the top level (of lowest precedence). But this is unambiguous, because of the parentheses in this recursion rule. P::RD may have other bells & whistles to help with precedence, but this is the only way I can ever remember to do precedence correctly, and it works with any kind of parser (although you may have to factor out left-recursion in right-associative operators). Update: BTW, you can string together all operations of the same precedence level, which is probably more efficient. I.e, parse (a+b)cde(f+g) into a shallow parse tree whose root has 5 children, instead of having a deeper parse tree whose nodes always have 2 children. However, for simplicity/uniformity it's often nice to always know that each node of the parse tree is a simple binary operation. blokhead	[reply]
Re^2: Order of Precedence in Parse::RecDescent grammar by suaveant (Parson) on Jun 01, 2005 at 17:12 UTC
So... away I went... and that seems to work, BUT! where when I parsed 4+12 before I got `$VAR1 = [ '4', '+', [ '1', '', '2' ] ];` [download] Now I get `$VAR1 = [ [ [ [ '4' ] ], '+', [ [ '1' ], '', [ '2' ] ] ] ];` [download] Ick... any good way to clean that up? It gets really nasty with more complex statements. Here is the new grammar. startrule: comparison_expr comparison_expr: additive_expr (comparison_op additive_expr { [ @item[1..2] ] })(s?) { [ $item[1], map { @ $_ } @{$item[2]} ] } comparison_op: />=?\|<=?\|!=\|==?\|le\|ge\|eq\|ne\|lt\|gt/ additive_expr: multiplicative_expr (additive_op multiplicative_expr { [ @item[1..2] ] })(s +?) { [ $item[1], map { @ $_ } @{$item[2]} ] } additive_op: /[+-]/ multiplicative_expr: modulus_expr (multiplicative_op modulus_expr { [ @item[1..2] ] })(s? +) { [ $item[1], map { @ $_ } @{$item[2]} ] } multiplicative_op: /[\/]/ modulus_expr: paren_expr ('%' paren_expr { [ @item[1..2] ] })(s?) { [ $item[1], map { @ $_ } @{$item[2]} ] } paren_expr: '(' comparison_expr ')' { $item[2] } \| number number: /[+-]?\d+/ { $item[1] } [download] - Ant - Some of my best work - (1 2 3)	[reply] [d/l] [select]
Re^3: Order of Precedence in Parse::RecDescent grammar by blokhead (Monsignor) on Jun 01, 2005 at 18:21 UTC
That's another reason why it's simpler to have just binary operations at each level (instead of a longer sequence of same-precedence operations). ;) Consider the generic outline: `lower_prec_expr : higher_prec_expr lower_prec_op lower_prec_expr \| higher_prec expr` [download] The problem is that you want your parse to give different semantic values for the different branches. If you take the 2nd branch, it's just a "pass-through" of the semantic value and you don't want to construct a new arrayref around it. If you take the first branch where there is actually an operation, you need to construct an arrayref with the appropriate things. You've combined both of these branches into one production in the grammar like this: `lower_prec_expr : higher_prec_expr (lower_prec_op lower_prec_expr)(s?) { [ "put stuff in arrayref" ] }` [download] So now you're putting things inside an arrayref even in the "pass-through" case, where there is no operator at this point in the parse. To fix this, put a conditional in your semantic rule to distinguish these cases, i.e: `lower_prec_expr : higher_prec_expr (lower_prec_op lower_prec_expr)(s?) { @{$item[2]} ? [ ... ] : $item[1] }` [download] That's kinda nasty. Alternatively, have two different productions explicitly in the grammar: `lower_prec_expr : higher_prec_expr (lower_prec_op lower_prec_expr)(s) { [ ... ] } \| higher_prec_expr` [download] I.e, if there are operators here (no question mark on the (s) quantifier), we build an arrayref for the semantic value with the first branch. Otherwise, the second branch has no semantic rule, so it's just a "pass-through" of the semantic value and does not add layers of arrayref. The latter is probably the best choice. Just change the (s?)'s to (s)'s and add the other alternation to the grammar. blokhead	[reply] [d/l] [select]
Re: Order of Precedence in Parse::RecDescent grammar by ambrus (Abbot) on Jun 01, 2005 at 17:40 UTC
Here's my guess. You have this: `expression: operand operator expression { [@item[1..3]] } \|` [download] This way, `3%24` can not be parsed as `(3%2)4` because according to the grammar, the left side of the operator must be an operand. `3%2` is not an operand, because it's not explicitly parenthisized.	[reply] [d/l] [select]


go ahead... be a heretic
	PerlMonks