http://qs321.pair.com?node_id=553889

Operator Associativity and Eliminating Left-Recursion in Parse::RecDescent

Synopsis

I have found documentation on eliminating left-recursion (such as Eliminating Left Recursion in Parse::RecDescent) to be unsatisfactory. Left recursion is usually eliminated at the expense of associativity. This tutorial seeks to address this issue.

The document provides two implementations for every topic covered. The first shows how the topic applies when evaluating the text at parse time. The second shows how the topic applies when building a parse tree. It is probably best to ignore the latter (parse tree creation) until the former (parse-time eval) is understood.

Feedback and criticisms are welcome.

Table of Contents

1. What is Operator Associativity?

The Perl binary operators + and - have the same precedence, but that doesn't mean they can be evaluated in any order. For example, consider 4 - 5 + 6.

If executed from left-to-right, 4 - 5 + 6 = (4 - 5) + 6 = 5 If executed from right-to-left, 4 - 5 + 6 = 4 - (5 + 6) = -7

Similarly,

If executed from left-to-right, 4 ** 3 ** 2 = (4 ** 3) ** 2 = 4096 If executed from right-to-left, 4 ** 3 ** 2 = 4 ** (3 ** 2) = 262144

Operators which are evaluated from left-to-right are left-associative.

Operators which are evaluated from right-to-left are right-associative.

In Perl, binary operators + and - are left-associative, and binary operator ** is right-associative. (Refer to Operator Precedence and Associativity in perlop for the associativity of other operators.)

2. Parsers and Associativity

Grammars do not specify associativity. A grammar simply defines whether a given string is valid in the language represented by the grammar, and associativity is not needed for that purpose.

However, we're rarely just interested in validity check. Parsers that return a parse tree representing the text being parsed and those that evaluate the text being parsed are much more useful. Because Parse::RecDescent processes rules from left to right, grammars can be written in a form that lends itself well to doing these tasks.

Left-associative:

sum : sum /[+-]/ NUM | NUM

Right-associative:

pow : NUM '**' pow | NUM

The following subsections will enrich these grammars with code to build a parse tree and to evaluate the expression at parse-time. As you will see, no changes will be needed to the grammar.

2.a. Parse-time Evaluation with Associativity

Left-associative:

sum : sum '+' NUM { $item[1] + $item[3] } | sum '-' NUM { $item[1] - $item[3] } | NUM { $item[1] }

Right-associative:

pow : NUM '**' sum { $item[1] ** $item[3] } | NUM { $item[1] }

2.b. Building a Parse Tree with Associativity

Left-associative:

sum : sum /[+-]/ NUM { [ @item[2,1,3] ] } | NUM { [ $item[1] ] }

Right-associative:

pow : NUM '**' pow { [ @item[2,1,3] ] } | NUM { [ $item[1] ] }

3. Eliminating Left-Recursion

There is a catch. The theory is solid, but parsers have limitations.

Productions of the form a : a b are called left-recursive. An entire class of parser generators cannot process left-recursive grammars, and Parse::RecDescent belongs to that class. Unfortunately, the left-associative rules presented so far are left-recursive. The remainder of this section will show methods of removing left-recursion from grammars for Parse::RecDescent.

3.a. Method 1: Create a Flat List, and Reconstruct

It's easy to parse 4 - 5 + 6 into the list '4', '-', '5', '+', '6'. The following snippet does so:

sum : NUM sum_ { [ $item[1], @{$item[2]} ] } sum_ : /[+-]/ NUM sum_ { [ $item[1], $item[2], @{$item[3]} ] } | { [] }

If we are evaluating at parse-time, we have little choice but to process the sum as a list rather than a binary operator. When building a parse tree, we have two options. We could leave it as is, or we could convert the list into a tree.

The following subsections show how to evaluate the list and how to treeify it.

3.a.i. ...to Evaluate the Text at Parse-time

{ sub eval_sum { my $acc = shift(@_); while (@_) { my $op = shift(@_); if ($op eq '+') { $acc += shift(@_); } elsif ($op eq '-') { $acc -= shift(@_); } } return $acc; } } sum : NUM sum_ { eval_sum($item[1], @{$item[2]}) } sum_ : /[+-]/ NUM sum_ { [ $item[1], $item[2], @{$item[3]} ] } | { [] }

3.a.ii. ...to Build a Parse Tree

{ sub treeify { my $t = shift(@_); $t = [ shift(@_), $t, shift(@_) ] while @_; return $t; } } sum : NUM sum_ { treeify($item[1], @{$item[2]}) } sum_ : /[+-]/ NUM sum_ { [ $item[1], $item[2], @{$item[3]} ] } | { [] }

3.b. Method 2: Create a Flat List Using <leftop>, and Reconstruct

This method is the same as Method 1, but takes advantage of a Parse::RecDescent feature to improve readability. Parse::RecDescent has a pair of directives to help build lists. <leftop> is designed to build left-associative lists, and <rightop> is designed to build right-associative lists.

3.b.i. ...to Evaluate the Text at Parse-time

{ sub eval_sum { my $acc = shift(@_); while (@_) { my $op = shift(@_); if ($op eq '+') { $acc += shift(@_); } elsif ($op eq '-') { $acc -= shift(@_); } } return $acc; } } sum : <leftop: NUM /[+-]/ NUM> { eval_sum(@{$item[1]}) }

3.b.ii. ...to Build a Parse Tree

{ sub treeify { my $t = shift(@_); $t = [ shift(@_), $t, shift(@_) ] while @_; return $t; } } sum : <leftop: NUM /[+-]/ NUM> { treeify(@{$item[1]}) }

3.c. Method 3: Using a Subrule Argument

Normally, information passes from subrule to superrule. For example, in the following code, rule2 receives the result of rule3. In turn, rule1 receives the result of rule2.

rule1: token rule2 rule2: token rule3 rule3: token

The deeper something is, the sooner it will get executed. In a list, that means the last (right-most) element encountered will be executed first. With left-associative lists, the opposite is needed. With left-associative lists, information needs to flow from the superrule to the subrule. Fortunately, Parse::RecDescent provides a means of passing information to subrules: Subrule argument lists.

Think of each rule as a function, and of each reference to that rule as a function call. (In fact, this is how the compiled grammars are implemented.) Just like functions can have arguments, so can subrules.

3.c.i. ...to Evaluate the Text at Parse-time

sum : NUM sum_[ $item[1] ] sum_ : '+' NUM sum_[ $arg[0] + $item[2] ] | '-' NUM sum_[ $arg[0] - $item[2] ] | { $arg[0] }

3.c.ii. ...to Build a Parse Tree

sum : NUM sum_[ $item[1] ] sum_ : '+' NUM sum_[ [ $item[1], $arg[0], $item[2] ] ] | '-' NUM sum_[ [ $item[1], $arg[0], $item[2] ] ] | { $arg[0] }

4. Improving Right-Recursion

Earlier, we ended up with the following rules for right-recursive binary operators:

pow : NUM '**' pow | NUM

Unlike left-recursion, Parse::RecDescent has no problem with right-recursion. However, Parse::RecDescent handles rules with productions with identical prefixes very inefficiently.

Just like in algebra, we can factor out the common prefix into another rule.

pow : NUM pow_ pow_ : '**' pow |

The complicated part is how to evaluate the expression or build the parse tree when one of the operands is matched by one rule, and the other is matched by a different rule. It turns out that doing this is very similar to eliminating left-recursion.

4.a. Method 1: Create a Flat List, and Reconstruct

Just like when eliminating left-recursion, we can build a flat list of the whole chain of powers, and work with that. The difference is that the list will be processed from right to left.

4.a.i. ...to Evaluate the Text at Parse-time

{ sub eval_pow { my $acc = pop(@_); while (@_) { my $op = pop(@_); $acc = pop(@_) ** $acc; } return $acc; } } pow : NUM pow_ { eval_pow($item[1], @{$item[2]}) } pow_ : '**' NUM pow_ { [ $item[1], $item[2], @{$item[3]} ] } | { [] }

4.a.ii. ...to Build a Parse Tree

{ sub treeify_r { my $t = pop; $t = [ pop, pop, $t ] while @_; return $t; } } pow : NUM pow_ { treeify_r($item[1], @{$item[2]}) } pow_ : '**' NUM pow_ { [ $item[1], $item[2], @{$item[3]} ] } | { [] }

4.b. Method 2: Create a Flat List Using <rightop>, and Reconstruct

Just like Parse::RecDescent has a directive for creating a flat list for a left-associative operator (<leftop>), it has one to create a flat list for a right-associative operator (<rightop>).

4.b.i. ...to Evaluate the Text at Parse-time

{ sub eval_pow { my $acc = pop(@_); while (@_) { my $op = pop(@_); $acc = pop(@_) ** $acc; } return $acc; } } pow : <rightop: NUM /(\*\*)/ NUM> { eval_pow(@{$item[1]}) }

4.b.ii. ...to Build a Parse Tree

{ sub treeify_r { my $t = pop; $t = [ pop, pop, $t ] while @_; return $t; } } pow : <rightop: NUM /(\*\*)/ NUM> { treeify_r(@{$item[1]}) }

4.c. Method 3: Using a Subrule Argument

Let's look at the algebra again. We can change

pow : NUM '**' pow { $item[1] ** $item[3] } | NUM { $item[1] }

into

pow : NUM pow_ pow_ : '**' pow { <<pow's $item[1]>> ** $item[2] } | { <<pow's $item[1]>> }

The problem is that we have to pass $item[1] from pow to pow_. We've already seen that we can pass data from one rule to another using subrule arguments. When eliminating left-recursion, we used the subrule argument to form a stack. When improving right-recursion, we simply pass from the main rule to the helper rule.

4.c.i. ...to Evaluate the Text at Parse-time

pow : NUM pow_[ $item[1] ] pow_ : '**' pow { $arg[0] ** $item[2] } | { $arg[0] }

4.c.ii. ...to Build a Parse Tree

pow : NUM pow_[ $item[1] ] pow_ : '**' pow { [ $item[1], $arg[0], $item[2] ] } | { $arg[0] }

5. Working Code

The following subsections contain complete, working code to parse expressions formed of the +, - and ** binary operators using the Subrule Argument methods. Parentheses are also supported to produce more meaningful results.

In order to support parentheses and to give the operators their proper precedence, the rules used in the upcoming code are slightly different from those seen earlier. Where NUM used to be in the productions, you will now find term (in sum/sum_) and sum (in pow/pow_).

The code of both subsections produce the same output, an uncommented version of the following:

Demonstrates left-associativity 4-5+6 = 5 got 5 (4-5)+6 = 5 got 5 4-(5+6) = -7 got -7 Demonstrates right-associativity 4**3**2 = 262144 got 262144 (4**3)**2 = 4096 got 4096 4**(3**2) = 262144 got 262144

5.a. ...to Evaluate the Text at Parse-time

use strict; use warnings; use Parse::RecDescent (); my $grammar = <<'__END_OF_GRAMMAR__'; { use strict; use warnings; } parse : expr /^\Z/ { $item[1] } # Just an alias expr : pow # vvv lowest precedence # pow : sum '**' pow # | sum pow : sum pow_[ $item[1] ] pow_ : '**' pow { $arg[0] ** $item[2] } | { $arg[0] } # sum : sum /[+-]/ term # | term sum : term sum_[ $item[1] ] sum_ : '+' term sum_[ $arg[0] + $item[2] ] | '-' term sum_[ $arg[0] - $item[2] ] | { $arg[0] } # ^^^ highest precedence term : '(' expr ')' { $item[2] } | /\d+/ __END_OF_GRAMMAR__ my $parser = Parse::RecDescent->new($grammar) or die("Bad grammar\n"); foreach my $expr ( '4-5+6', # Demonstrates left-associativity '(4-5)+6', '4-(5+6)', '4**3**2', # Demonstrates right-associativity '(4**3)**2', '4**(3**2)', ) { my $expected = eval $expr; my $got = $parser->parse($expr); print("$expr = $expected got $got\n"); }

5.b. ...to Build and Evaluate a Parse Tree

use strict; use warnings; use Parse::RecDescent (); my $grammar = <<'__END_OF_GRAMMAR__'; { use strict; use warnings; } parse : expr /^\Z/ { $item[1] } # Just an alias expr : pow # vvv lowest precedence # pow : sum '**' pow # | sum pow : sum pow_[ $item[1] ] pow_ : '**' pow { [ $item[1], $arg[0], $item[2] ] } | { $arg[0] } # sum : sum /[+-]/ term # | term sum : term sum_[ $item[1] ] sum_ : /[+-]/ term sum_[ [ $item[1], $arg[0], $item[2] ] ] | { $arg[0] } # ^^^ highest precedence term : '(' expr ')' { $item[2] } | /\d+/ { [ @item ] } __END_OF_GRAMMAR__ my $parser = Parse::RecDescent->new($grammar) or die("Bad grammar\n"); my %eval = ( term => sub { $_[1] }, '+' => sub { eval_node($_[1]) + eval_node($_[2]) }, '-' => sub { eval_node($_[1]) - eval_node($_[2]) }, '**' => sub { eval_node($_[1]) ** eval_node($_[2]) }, ); sub eval_node { my ($node) = @_; $eval{$node->[0]}->(@$node); } foreach my $expr ( '4-5+6', # Demonstrates left-associativity '(4-5)+6', '4-(5+6)', '4**3**2', # Demonstrates right-associativity '(4**3)**2', '4**(3**2)', ) { my $expected = eval $expr; my $tree = $parser->parse($expr); my $got = eval_node($tree); print("$expr = $expected got $got\n"); }

Update Aug 13, 2006: The examples have been simplified. A right-associative operator is used for the right-associative examples. Parse-time eval was placed before parse tree building. Added section on simplifying right-recursion. Small additions were made here and there to improve clarity. It still needs to link to a tutorial on precedence.

Update Jun 13, 2014: Fixed spelling and grammar mistakes identified by hexcoder.

Update Oct 3, 2016: Fixed indexing problem raised by an anonymous monk.

Replies are listed 'Best First'.
Re: Operator Associativity and Eliminating Left-Recursion in Parse::RecDescent
by blokhead (Monsignor) on Jun 06, 2006 at 20:02 UTC
    ikegami++ .. I recently tried to get a sane parse tree from a grammar which (like your example scenario) was the result of left-recursion elimination on left-associative grammar rules. I found it to be highly nontrivial.

    I was at least aware of the first two approaches you presented (though it would have taken me a long time to get approach #3a working), but it had not occured to me to use an accumulation paradigm like you do in #3c. This is a standard trick in making recursive functions tail-recursive (pass forward the intermediate results as an argument, instead of passing back the intermediate results as a return value).

    I probably wouldn't have thought to apply this trick to parsing, though. If you're a theorist like me and view parsing through the lens of formal languages, then passing information from parent rule to child rule is not a natural approach. Good thing there are still practitioners out there ;)

    blokhead

Eliminating the common prefix from a Parse::RecDescent rule
by ikegami (Patriarch) on Aug 13, 2006 at 21:18 UTC

    I made some rather major changes to the parent node. I'm creating this new node so people interested in the topic notice the change.

    Most of the changes were made to improve clarity. Despite the scope of the changes, the node is very similar to it's previous incarnation. (That's why I didn't create a new node.)

    The most important change is the new section discussing the elimination of the very inefficient duplication in rules such as

    pow : NUM '**' pow | NUM

    In the spirit of the node in which it is contained, I refered to this as the "improving right-recursion", but the concepts can be applied to any rule with productions with a common prefix.

      Thank you for this great tutorial. This was a very helpful introduction.

      I tried to play a bit with your example. In the following code I do not get the Error "Bad Expression" for my second expression although it is not valid.

      use strict; use warnings; use Parse::RecDescent (); my $grammar = <<'__END_OF_GRAMMAR__'; { use strict; use warnings; } { sub eval_sum { my $acc = shift(@_); while (@_) { my $op = shift(@_); if ($op eq '+') { $acc += shift(@_); } elsif ($op eq '-') { $acc -= shift(@_); } } return $acc; } } sum : NUM sum_ { eval_sum( ($item[1], @{$item[2]}) ) } sum_: /[+-]/ NUM sum_ { $return = [$item[1], $item[2], @{$item[3]}] +} | { $return = [] } NUM : /\d+/ { $return = $item[1] } __END_OF_GRAMMAR__ my $parser = Parse::RecDescent->new($grammar) or die("Bad grammar\n"); foreach my $expr ('4-5+6-2','4*5') { my $sum = $parser->sum($expr) or die "Bad expression"; print "$sum" . "\n"; }

      Why do I not get the "Bad expression" error for my second expression?

      Thank you

      Dirk

        You must check that nothing follows what sum matches.
        evaluate : sum /\Z/ { $item[1] } ->evaluate($expr)
Re: Operator Associativity and Eliminating Left-Recursion in Parse::RecDescent
by Anonymous Monk on Oct 02, 2016 at 05:58 UTC
    Typo: This line,
    pow_ : '**' pow { [ $item[2], $arg[0], $item[3] ] }
    should instead be:
    pow_ : '**' pow { [ $item[1], $arg[0], $item[2] ] }
    The correct version is there in the code, but not in its earlier description. Very nice article, thanks!

      Fixed.

Re: Operator Associativity and Eliminating Left-Recursion in Parse::RecDescent
by hexcoder (Curate) on Jun 13, 2014 at 23:33 UTC
    Thanks very much, a very very useful tutorial! I especially like the methodical and analytical approach very much.

    Some small things I found in the text:


    The document provides two implementation
    =>
    The document provides two implementations

    have the same preceedence
    =>
    have the same precedence

    The following subsections show how to evaluate the list how to treeify it.
    =>
    The following subsections show how to evaluate the list and how to treeify it.

    In a list, that means the means the last (right-most) element encountered will be executed first.
    =>
    In a list, that means the last (right-most) element encountered will be executed first.

    Fortuantely, Parse::RecDescent provides a means
    =>
    Fortunately, Parse::RecDescent provides a means

    and each reference to that rule
    =>
    and of each reference to that rule

    Just like function can have arguments, so can subrules.
    =>
    Just like functions can have arguments, so can subrules.

    However, Parse::RecDescent handles rule with productions
    =>
    However, Parse::RecDescent handles rules with productions

    Just like Parse::RecDescent has an directive for creating a flat list for a left-associative operator (<lefttop>),
    =>
    Just like Parse::RecDescent has a directive for creating a flat list for a left-associative operator (<leftop>),

    The problem we have to pass $item[1] from pow to pow_.
    =>
    The problem is we have to pass $item[1] from pow to pow_.

    The following subsections contains complete, working code
    =>
    The following subsections contain complete, working code

      If you had posted that a week earlier, it would have been posted on the 8th anniversary of the OP!

      Thanks, fixes applied.

        Well, I just discovered the OP.

        Originally I wanted to email this to you, but I could not find a way in Perlmonks to do that. The private msg function only offers a tiny line, which did not seem to fit.

        So I posted it (which lowered my reputation a tiny bit).

        What would you suggest in this case?

        Thanks hexcoder

Re: Operator Associativity and Eliminating Left-Recursion in Parse::RecDescent
by Anonymous Monk on May 06, 2017 at 21:53 UTC
    Thank you very much for this post, I couldn't find solution on the net regarding operator associotavity with gramma notation. Your post has really opened my eyes, great one!
Re: Operator Associativity and Eliminating Left-Recursion in Parse::RecDescent
by rvosa (Curate) on Jun 23, 2006 at 03:32 UTC
    Great, great post. I wish I understood parsers better.