|Just another Perl shrine|
Operator Associativity and Eliminating Left-Recursion in Parse::RecDescent
I have found documentation on eliminating left-recursion (such as Eliminating Left Recursion in Parse::RecDescent) to be unsatisfactory. Left recursion is usually eliminated at the expense of associativity. This tutorial seeks to address this issue.
The document provides two implementations for every topic covered. The first shows how the topic applies when evaluating the text at parse time. The second shows how the topic applies when building a parse tree. It is probably best to ignore the latter (parse tree creation) until the former (parse-time eval) is understood.
Feedback and criticisms are welcome.
Table of Contents
The Perl binary operators + and - have the same precedence, but that doesn't mean they can be evaluated in any order. For example, consider 4 - 5 + 6.
Operators which are evaluated from left-to-right are left-associative.
Operators which are evaluated from right-to-left are right-associative.
In Perl, binary operators + and - are left-associative, and binary operator ** is right-associative. (Refer to Operator Precedence and Associativity in perlop for the associativity of other operators.)
Grammars do not specify associativity. A grammar simply defines whether a given string is valid in the language represented by the grammar, and associativity is not needed for that purpose.
However, we're rarely just interested in validity check. Parsers that return a parse tree representing the text being parsed and those that evaluate the text being parsed are much more useful. Because Parse::RecDescent processes rules from left to right, grammars can be written in a form that lends itself well to doing these tasks.
The following subsections will enrich these grammars with code to build a parse tree and to evaluate the expression at parse-time. As you will see, no changes will be needed to the grammar.
There is a catch. The theory is solid, but parsers have limitations.
Productions of the form a : a b are called left-recursive. An entire class of parser generators cannot process left-recursive grammars, and Parse::RecDescent belongs to that class. Unfortunately, the left-associative rules presented so far are left-recursive. The remainder of this section will show methods of removing left-recursion from grammars for Parse::RecDescent.
It's easy to parse 4 - 5 + 6 into the list '4', '-', '5', '+', '6'. The following snippet does so:
If we are evaluating at parse-time, we have little choice but to process the sum as a list rather than a binary operator. When building a parse tree, we have two options. We could leave it as is, or we could convert the list into a tree.
The following subsections show how to evaluate the list and how to treeify it.
This method is the same as Method 1, but takes advantage of a Parse::RecDescent feature to improve readability. Parse::RecDescent has a pair of directives to help build lists. <leftop> is designed to build left-associative lists, and <rightop> is designed to build right-associative lists.
Normally, information passes from subrule to superrule. For example, in the following code, rule2 receives the result of rule3. In turn, rule1 receives the result of rule2.
The deeper something is, the sooner it will get executed. In a list, that means the last (right-most) element encountered will be executed first. With left-associative lists, the opposite is needed. With left-associative lists, information needs to flow from the superrule to the subrule. Fortunately, Parse::RecDescent provides a means of passing information to subrules: Subrule argument lists.
Think of each rule as a function, and of each reference to that rule as a function call. (In fact, this is how the compiled grammars are implemented.) Just like functions can have arguments, so can subrules.
Earlier, we ended up with the following rules for right-recursive binary operators:
Unlike left-recursion, Parse::RecDescent has no problem with right-recursion. However, Parse::RecDescent handles rules with productions with identical prefixes very inefficiently.
Just like in algebra, we can factor out the common prefix into another rule.
The complicated part is how to evaluate the expression or build the parse tree when one of the operands is matched by one rule, and the other is matched by a different rule. It turns out that doing this is very similar to eliminating left-recursion.
Just like when eliminating left-recursion, we can build a flat list of the whole chain of powers, and work with that. The difference is that the list will be processed from right to left.
Just like Parse::RecDescent has a directive for creating a flat list for a left-associative operator (<leftop>), it has one to create a flat list for a right-associative operator (<rightop>).
Let's look at the algebra again. We can change
The problem is that we have to pass $item from pow to pow_. We've already seen that we can pass data from one rule to another using subrule arguments. When eliminating left-recursion, we used the subrule argument to form a stack. When improving right-recursion, we simply pass from the main rule to the helper rule.
The following subsections contain complete, working code to parse expressions formed of the +, - and ** binary operators using the Subrule Argument methods. Parentheses are also supported to produce more meaningful results.
In order to support parentheses and to give the operators their proper precedence, the rules used in the upcoming code are slightly different from those seen earlier. Where NUM used to be in the productions, you will now find term (in sum/sum_) and sum (in pow/pow_).
The code of both subsections produce the same output, an uncommented version of the following:
Update Aug 13, 2006: The examples have been simplified. A right-associative operator is used for the right-associative examples. Parse-time eval was placed before parse tree building. Added section on simplifying right-recursion. Small additions were made here and there to improve clarity. It still needs to link to a tutorial on precedence.
Update Jun 13, 2014: Fixed spelling and grammar mistakes identified by hexcoder.
Update Oct 3, 2016: Fixed indexing problem raised by an anonymous monk.