Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

I haven't thought of this topic much before you brought it up and I haven't researched it, so I'm making it up as I go along. However, I do have experience with parsing and with P::RD specifically, so hopefully you'll find it useful.


  • Is it necessarily possible to reverse the data through the grammar to get the command line?
  • or is this only possible for special grammars?
  • Is there something I need to in my grammar to make this possible?

Yes, it's possible to deparse a parse tree, if the parse tree contains sufficient information. That means that your parser must return sufficient information to regenerate an acceptably similar source. (e.g. If your language accepts single-quoted and double-quoted string literals, you may not care which one is used in the deparsed text as long as the resulting text is equivalent to the original. You probably don't care about whitespace and comments either.) With P::RD, that is controlled by the grammar.

  • Is there a module that already reverses the parsing of Parse::RecDescent parsers?
  • Is there another module that makes this bi-directionality easier (I'd prefer not to learn another grammar specification syntax, but could)?
  • Is there a module that already does all this soup-to-nuts?

I'm afraid the grammar couldn't be the same as a the one used by P::RD. Actions ({ ... }) and many directives (<...>) cannot be automatically reversed. Error checks would be different. etc.

Now, something like BNF could be automatically reversed, so you could write a parser and deparser that accepted pure BNF. The problems with this approach is that the product returned by the parser would be of limited use. It would require further processing. This is what the P::RD lumps in with the grammar, and that's what's not automatically reversible.


Next, let's address some issues with the grammar.

Bugs:

  • Use of captures is undocumented and unnecessary. Use "{ $item[1] }" instead of captures and "{ $1 }".
  • "/ident/" incorrectly matches the start of "identx".
  • You don't check if the end of input was reached, so trailing errors in your input can be ignored. In fact, it happens in one of your test cases: ".dfo" is silently ignored in "hap -b sknxharvest01 -enc testfile.dfo". Note the "/\Z/" I added.
  • A common mistake is to assume the strict and warnings settings from the caller are used for the parser. You need to include these in the code block in the grammar if you want them.

Sketchy Code:

  • No reason to compile the grammar every time.
  • "rule : subrule { $item[1] }" is the same as "rule : subrule".
  • Each rule invoked adds a fair bit of time to the overhead when using P::RD, so I inlined the trivial ones.
  • Something that's optional should be made optional by a rule, not by matching a zero-length string. (Re: optvalue)

Stylistic:

  • I renamed "startrule" to "parse" since I find "$parser->parse($text)" appealing.
  • Shortened "command", "pacakge" and "option" to their well known abbreviations: "cmd", "pkg" and "opt".
  • I seperated token rules from other rules and uppercased them. P::RD doesn't distinguish between tokens and higher-level rules, but there tends to be subtle differences between them. For example, I tend to avoid including $item[0] in the returned value for tokens, while I tend to include it for other rules.
  • I align : and |. I find this helps readability a lot.
# build_parser.pl use strict; use warnings; use Parse::RecDescent qw( ); my $grammar = <<'__END_OF_GRAMMAR__'; { # Lexical pragmas and lexcial variables # here are in the same scope as the sub # created for each grammar rule. use strict; use warnings; my %KNOWN_COMMANDS = map { $_ => 1 } qw( hap hccmrg hup hspp hpp hpg hdp hdlp hcp ); } parse : cmdline /\Z/ { $item[1] } cmdline : pkgcmd opt(s?) { [ @item[1,2] ] } pkgcmd : IDENT pkgcmd_[ $item[1] ] { $item[1] } pkgcmd_ : { $KNOWN_COMMANDS{$arg[0]} } | <error:Unknown command> opt : FLAG opt_ { [ @item[1,2] ] } opt_ : IDENT | { 1 } FLAG : /-\w+/ IDENT : /\w+/ __END_OF_GRAMMAR__ Parse::RecDescent->Precompile($grammar, 'Parser') or die("Bad Grammar\n");

In this post, I show how to deparse the tree created by the above grammar. For more complex trees, it can be very useful to have each non-token rule returned a bless object. Then some method in the class would be responsible for deparsing itself instead of relying on Deparse to do it. I'll get to that in a separate post.

A quick note before contining: You'll notice the parse tree doesn't differentiate between "-opt" and "-opt 1", so you'll get "-opt 1" when you deparse it. This may be acceptable, where it would be a case of being "acceptably similar" as mentioned above. If it's not acceptable, then it's a case of the parse tree not holding enough information as is.

Basically, I created a function for each rule. The function resemble the associated rule, but deparses it instead. Note that your grammar is very trivial (has no choices in it really), which is why you didn't need to use $item[0] (or some other constants) anywhere, and you don't have to make any choices when deparsing.

# Deparse.pm use strict; use warnings; package Deparser; my $skip = ' '; sub add_skip { if (length($_[0])) { return $skip . $_[0]; } else { return ''; } } sub loop(&@) { my $cb = shift(@_); return join $skip, map { $cb->() } @_; } sub deparse { &cmdline } sub cmdline { my ($node) = @_; my ($pkgcmd, $opts) = @$node; my $text = pkgcmd($pkgcmd); $text .= add_skip(loop { opt($_) } @$opts); return $text; } sub pkgcmd { my ($node) = @_; return IDENT($node); } sub opt { my ($node) = @_; my ($flag, $val) = @$node; my $text = FLAG($flag); $text .= add_skip(IDENT($val)); return $text; } sub FLAG { return $_[0] }; sub IDENT { return $_[0] }; 1;

As you can see, it's not too hard. Just make sure to reflect changes to your parser with changes to your deparser.

Now let's use the code.

# test.pl use strict; use warnings; use Data::Dumper qw( Dumper ); use Parser qw( ); use Deparser qw( ); { my $parser = Parser->new(); while (<DATA>) { chomp; print("Unparsed: $_\n"); my $tree = $parser->parse($_) or do { print("Bad data: $_\n"); next; }; my $dumper = Data::Dumper->new([ $tree ]); $dumper->Useqq(1); $dumper->Terse(1); $dumper->Indent(0); print("Parsed: ", $dumper->Dump(), "\n"); my $deparsed = Deparser::deparse($tree); print("Deparsed: $deparsed\n"); } continue { print("\n"); } } __DATA__ hap -b hap -b sknxharvest01 hap -b sknxharvest01 -enc testfile.dfo hap -b sknxharvest01 -usr cgowing -pass chaspass hap -badflag hap -prompt hap -b sknxharvest01 -prompt hap -prompt -b sknxharvest01

Results:

>perl build_parser.pl >perl test.pl Unparsed: hap -b Parsed: ["hap",[["-b",1]]] Deparsed: hap -b 1 Unparsed: hap -b sknxharvest01 Parsed: ["hap",[["-b","sknxharvest01"]]] Deparsed: hap -b sknxharvest01 Unparsed: hap -b sknxharvest01 -enc testfile.dfo Bad data: hap -b sknxharvest01 -enc testfile.dfo Unparsed: hap -b sknxharvest01 -usr cgowing -pass chaspass Parsed: ["hap",[["-b","sknxharvest01"],["-usr","cgowing"],["-pass"," +chaspass"]]] Deparsed: hap -b sknxharvest01 -usr cgowing -pass chaspass Unparsed: hap -badflag Parsed: ["hap",[["-badflag",1]]] Deparsed: hap -badflag 1 Unparsed: hap -prompt Parsed: ["hap",[["-prompt",1]]] Deparsed: hap -prompt 1 Unparsed: hap -b sknxharvest01 -prompt Parsed: ["hap",[["-b","sknxharvest01"],["-prompt",1]]] Deparsed: hap -b sknxharvest01 -prompt 1 Unparsed: hap -prompt -b sknxharvest01 Parsed: ["hap",[["-prompt",1],["-b","sknxharvest01"]]] Deparsed: hap -prompt 1 -b sknxharvest01

Time for a break! Blessed version later.


In reply to Re: Reversible parsing (with Parse::RecDescent?) by ikegami
in thread Reversible parsing (with Parse::RecDescent?) by goibhniu

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2024-04-19 01:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found