Working Notes on Parse::RecDescent
Warning:
Since this isn't a tutorial, the following assumes that you are familiar with the module in question, at least enough so that the things mentioned are not a complete mystery. Another thing to be aware of is that these are answers that I found along the way-- what may have been obvious to me may be not be to you. And certainly vice versa! I'm writing this down because had I known any of this, I would have saved time and effort-- your mileage may well vary. Last caveat, any or all of this could be suspect, certainly better folk than I may have better ways of dealing with any or all of this (speak up if this is you...), all I know is that it works for me! (Got that line from tech support...)Random Tips
- List alternates on separate lines. That means, not this:
but this:name: 'match' | 'name' | 'mode' | 'priority'name: 'match' | 'name' | 'mode' | 'priority' - Precede alternates with '|' rather than follow. Because eventually what you really want is:name: 'match' { [$item[0],$item[1]] } | 'name' { [$item[0],$item[1]] } | 'mode' { [$item[0],$item[1]] } | 'priority' { [$item[0],$item[1]] }
- Use sub AUTOLOAD Due to a less than careful reading of the camel book, I'd thought that this only worked in '.pm' files, i.e. as part of a package. But a little thinking reminded me that every thing is a package, so this magical feature is available to anyone with the need. Here is what I use:
“But what's it good for?” you ask. Well as the book points out, when a subroutine is undefined, then AUTOLOAD is called with the same arguments that would have been passed to the original subroutine. This is useful in at least two cases, when you want to see what was sent to a given function and when you haven't written the missing code. As an example, during the code and test cycle I'll make a change to some portion of the parse tree. If it is an addition, I'll hold off on writing the necessary code until I can confirm my 'expectations' about what the new code will be sent. If on the other hand a change causes a problem, then it is easy enough to change the effected functions (I just pre pend an underbar to the function name) and then analyze the resultant display of information. It is not a particularly sophisticated technique (a semi-modern version of the old IBM core dump), but I use it because it works, not because it's trendy!sub AUTOLOAD { my $tree = shift; our $AUTOLOAD; print STDERR "@@@ $AUTOLOAD(\$tree) @@@\n"; recurse($tree); } - Use:
- use strict;
- no strict "refs";
- use warnings;
- use diagnostics;
- use Parse::RecDescent;
- use Data::Dumper;
- use Carp::Assert;
- use Carp;
- Use print Dumper $whatever,"\n"; as needed. Just because you are lost in a maze of twisty little passages, doesn't mean you can't get a decent road map.
- In addition to Dumper, don't hesitate to roll your own tree walkers. Code reuse is good, but sometimes it isn't what you need. Thing to remember here is that the parser returns a parse tree, nothing too complicated, either the 'current thing' is an array or it's not. If not then do something with the data, otherwise move to the next level and repeat the process.
- It is nice to say “Use the source Luke!” but the truth is that Parse::RecDescent is pretty opaque. You are probably better off using the tried and true print statement instead.
- Use $::RD_AUTOACTION = q { [@item[0..$#item]] }; until you know what you are doing and then replace each default action with one tailored to your needs.
- $item[0], is your friend, keep it around. A crude version of compiling is just two steps; build the parse tree and then execute the tree! Think of it this way, with almost no effort at all, the first item in an array in the tree is the rule name-- why not also think of it as the function name that will process the tree at that point. Here is my 'standard' sub startrule with code in red that depends on $item[0]:
Like I said, keep it around, it's useful!sub startrule { my $tree = shift; foreach (@$tree) { if ( ref eq 'ARRAY' ) { if ( ref( @$_[0] ) eq 'ARRAY' ) { startrule($_); } else { <span style="color:red">&{ @$_[0] } ($_);</span> } } } } - Place trial input in the __DATA__ section, this allows regression testing, so use discretion when weeding this out.
- Refactor constantly-- often portions of the grammar will become obsolete and will need to be pruned. Because this is the case, the previous item becomes even more important. Even if you have a fairly complete design before you commit to code, the process of building and testing will suggest changes, and changes will result in a certain amount of obsolecence-- hence the need for pruning shears!
- Replace all regex character sets with a symbolic reference. Well that's not quite correct, replace all but the last instance! In other words, things like:
Becomes:style_option: 'version' { [@item[0..$#item]] } | 'xmlns:xsl' { [@item[0..$#item]] } | 'id' { [@item[0..$#item]] } | 'extension-element-prefixes' { [@item[0..$#item]] } | 'exclude-result-prefixes' { [@item[0..$#item]] } | /[a-zA-Z0-9:_.\-]+/ { [@item[0..$#item]] }
Think of this as yet another version of the 'No magic numbers' rule. Besides it's likely that you will want to use 'char_set' elsewhere and having a single point of definition makes later changes manageable!style_option: 'version' { [@item[0..$#item]] } | 'xmlns:xsl' { [@item[0..$#item]] } | 'id' { [@item[0..$#item]] } | 'extension-element-prefixes' { [@item[0..$#item]] } | 'exclude-result-prefixes' { [@item[0..$#item]] } | char_set { [@item[0..$#item]] } char_set: /[a-zA-Z0-9:_.\-]+/ { $item[1] } - New sym-refs need not disturb the parse tree, you can 'stealth' them in-- i.e. { $item[1] }, in place of { [$item[0],$item[1]] }. (See previous example.)
- Parse trees usually don't need literals, remove them when you can. For instance, if you have something like this:
You will do better to have an action like this:preserve_space: 'preserve-space' '[' 'elements' '=' qstring ']' paren(?)
As you can see, 'qstring' is the only significant bit here so the returned value for this rule is rule-name followed by rule-value.{ [$item[0],$item[5]] } - If you use the 'special magic' to create a grammar class from the command line using:
> perl -MParse::RecDescent - grammar Yet::Another::Grammar
be aware that this might not work the same if the original (presumably in-line) grammar depended on $::RD_AUTOACTION = q { [@item[0..$#item]] }; or similar. The magic method ignores RD_AUTOACTION and uses it's own default action as needed. Solution is to duplicate the auto action by hand in the grammar file-- this is not such a big deal since by then most actions will have already been customized. There is still no free lunch! Eventually you will come to a point where you need to parse either multi-line comments or something similar. Be aware that just because every one said to use Parse::RecDescent doesn't mean that the answer is easy. It's not, you still need to bite the bullet and do the work. You may at first think that <perl_quotelike> is the way out. Do not be deceived! When the documentation says
Parse::RecDescent provides limited support for parsing subsets of Perl, namely: quote-like operators, Perl variables, and complete code blocks.
it is being literal. If your language is not 'Perl' then this short cut will not get you to the church on time!Least I be accused of talking around the problem, here is what I do to support multi-line comments:
Where the function used looks like:xcomment: <skip: qr/[ \t]*/> newline(0..) '<!--' { ($text,$return) = main::parse_delimited($text,'<!--','- +->'); $return = ['xcomment',$return]; }
It is not a perfect solution, as the documentation says,#_________________________________________________________________ +_____________ sub parse_delimited { my $text = shift; my $startdelim = shift; my $enddelim = shift; my $mc = new Text::DelimMatch( $startdelim, $enddelim ); my ( $p, $m, $r ) = $mc->match( $startdelim . $text ); if ($p) { $text = $p; } else { $text = ""; } $text .= $r if ($r); $m =~ s/^$startdelim//; $m =~ s/$enddelim$//; return $text, $m; } #_________________________________________________________________ +_____________Modifying the value of the variable $text may confuse the column counting mechanism
but other than that it does have the virtue of 'working'!- Greediest production in a rule goes last. For instance given the following:
You are never going to get to 'name', because 'is_printable' will consume all of the characters in any given 'name', do not pass Go, do not etc. Further, correct ordering from least greedy to most, allows the last sub-rule to act as a backstop for the rule in general.startrule: is_printable(s) {[$item[0],$item[1]]} | name {[$item[0],$item[1]]} is_printable: <skip: ''> /[[:print:]]+/ { [$item[0],$item[2]] } name: 'match' { [@item[0..$#item]] } | 'name' { [@item[0..$#item]] } | 'mode' { [@item[0..$#item]] } | 'priority' { $item[1] } - Comment on Random Tips on Parse::RecDescent
- Select or Download Code
- Watch for: Direct replies / Any replies
–hsm
"Never try to teach a pig to sing…it wastes your time and it annoys the pig."Replies are listed 'Best First'. | |
---|---|
Re: Random Tips on Parse::RecDescent
by Aristotle (Chancellor) on Jul 11, 2002 at 23:01 UTC | |
by hsmyers (Canon) on Jul 12, 2002 at 13:21 UTC | |
by Aristotle (Chancellor) on Jul 13, 2002 at 02:30 UTC | |
by hsmyers (Canon) on Jul 13, 2002 at 16:10 UTC | |
Re: Random Tips on Parse::RecDescent
by educated_foo (Vicar) on Jul 11, 2002 at 15:49 UTC | |
by hsmyers (Canon) on Jul 11, 2002 at 16:49 UTC | |
Re: Random Tips on Parse::RecDescent
by herveus (Prior) on Jul 11, 2002 at 15:01 UTC | |
Re: Random Tips on Parse::RecDescent
by davistar (Novice) on Feb 02, 2006 at 19:39 UTC | |
by Anonymous Monk on Apr 27, 2012 at 10:53 UTC | |
by hsmyers (Canon) on Feb 03, 2006 at 02:03 UTC | |
by davistar (Novice) on Feb 04, 2006 at 18:24 UTC | |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Back to
Meditations