Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Hello there

I need some help with Parse::RecDescent. I successfully created a grammar to parse a pseudo-ini file (more on the format in a second), but I would like to expand it to parse inline comments, as well as single-line comments.

Disclaimer: I didn't choose the format, and I can't change it. The only thing I can do with it is to parse it.

The file has section declarations like this:

[section_name]

and key/value associations like this:

parameter=value values="may also be quoted"

It can have comment lines

    ; like this

and it allows for blank lines.

it admits multiple assignments to the same parameter, resulting in an array of values for that parameter:

parameter=value1 parameter=value2 parameter=value3 ; that would result in parameter = ( value1, value2, value3 )

a parameter could be assigned a full Pike multivalue data structure, that is an array:

    parameter=({ value1, value2, value3 })

or a mapping (key/value pairs):

    parameter=([ "key1" : "value1", "key2" : "value2" ])

these structures could be multilined and nested!!! As in:

parameter=({ ([ "key1" : "value1", "key2" : ({ "array", "value" }), ]), "second element of this array", ({ "and here is another array", ({ "with another one nested", }), ([ "that" : "contains", "one" : "more", "hash" : "value", ]), }), "Hooray!", })

As said, I made a grammar that correctly parses a file 350kb big (slow, but works well):

# $Id: g3.txt,v 1.14 2010/07/23 12:41:24 bronto Exp bronto $ AsIni: Line(s?) /\Z/ Line: CommentLine | BlankLine | SectionDeclaration | AssignmentLine | <error> CommentLine: <skip: q{}> /^\s*/m ';' /.*$/m { print STDERR qq{\tSkipping comment: $item[4]\n} ; } BlankLine: <skip: q{}> /^\s+$/m { print STDERR qq{\tSkipping blank line\n} } SectionDeclaration: '[' /[^\]]+/ ']' { print STDERR qq{In section "$item[2]"\n} ; my $sectionname = $item[2] ; $AsIni::section = $sectionname ; } AssignmentLine: Parameter '=' Value(?) { my $distvalue = $item[3] ; my $parmname = $item[1] ; my $paramvalue ; ( $paramvalue ) = @$distvalue ; if ( not exists $AsIni::node{$AsIni::section}{$parmname} ) { $AsIni::node{$AsIni::section}{$parmname} = [] ; } # Get a reference to the current array of values for this paramete +r # in $current my $current = $AsIni::node{$AsIni::section}{$parmname} ; # We can update this safely, since we are using the reference push @$current,$paramvalue ; } Parameter: /\w[\w\s-]*/ { $return = $item[1] ; } Value: PikeStructure | ValueString { $return = $item[1] ; } ValueString: QuotedString | UnquotedString { $return = $item[1] ; } QuotedString: '"' /[^"]+/ '"' { $return = $item[2] ; } UnquotedString: /.+/ { $return = $item[1] ; } # This rule matches a number, but rejects null-length results Number: /[+-]?\d*(\.\d+)?/ <reject: $item[1] eq ''> { $return = $item[1] ; } PikeStructure: PikeArray | PikeMapping { # $item[1] is a reference to an array (PikeArray) or hash # (PikeMapping). We bubble it up as is $return = $item[1] ; } PikeArray: '({' PikeArrayContent '})' { # $item[2] is a PikeArrayContent, and since PikeArrayContent # bubbles up a reference to an array of PikeValues, this should be # an array reference that we can safely bubble up as is. $return = $item[2] ; } PikeMapping: '([' PikeMappingContent '])' { # $item[2] is a PikeMappingContent, and since PikeMappingContent # bubbles up an hash reference, this should be an hash reference t +hat we # can safely bubble up as is. $return = $item[2] ; } PikeStructureSeparator: ',' PikeArrayContent: PikeArraySequence(?) PikeStructureSeparator(?) { # $item[1] comes from a repetition of PikeArraySequence, # so it is a reference to an array of 0 or 1 PikeArraySequence. # In turn, PikeArraySequence is a reference to an array of # PikeValue's. We don't want to change the PikeValue's but we need # to unroll $item[1] before bubbling it up. ( $return ) = @{ $item[1] } ; } PikeArraySequence: PikeValue PikeArrayFurtherValue(s?) { # $item[1] is a PikeValue, hence: # - a reference to an array or hash (if PikeStructure) # - a scalar (if QuotedString or Number) # # $item[2] comes from a repetition of PikeArrayFurtherValue, # so it is a reference to an array of 0 or 1 PikeArrayFurtherValue +. # In turn, PikeArrayFurtherValue just returns a PikeValue. So, # we actually don't want to change $item[1], but we need to # unroll $item[2] before returning it. Actually, we return an # array reference with the whole thing. $return = [ $item[1], @{ $item[2] } ] ; } PikeArrayFurtherValue: PikeStructureSeparator PikeValue { # $item[2] is a PikeValue, hence: # - a reference to an array or hash (if PikeStructure) # - a scalar (if QuotedString or Number) # We bubble it up as is. $return = $item[2] ; } PikeMappingContent: PikeMappingSequence(?) PikeStructureSeparator(? +) { # Since we have a repetition here, $item[1] is a reference to an # array which may contain 0 or 1 PikeMappingSequence's. # In turn, PikeMappingSequence returns an hash reference. # So, if we want the hash reference to bubble up, we have to # unwrap it and return it as is. ( $return ) = @{ $item[1] } ; } PikeMappingSequence: PikeMappingPair PikeMappingFurtherPair(s?) { # $item[1] is a PikeMappingPair, hence a reference to an array # of two elements: a string and a PikeValue, that is: # - a reference to an array or hash (if Pikevalue ~ PikeStructure) # - a scalar (if PikeValue ~ QuotedString or Number) # # $item[2] has a repetition, so it is a reference to an array of # PikeMappingFurtherPair's. Since PikeMappingFurtherPair just # returns a PikeMappingPair (see $item[1]), then $item[2] is # a reference to an array where each element is, in turn, a # reference to an array of two elements. # # Since we are going to return an hash here, we create a reference # to an hash; to correctly unroll the values of $item[1] and # $item[2] we: # - simply dereference $item[1], hence unrolling the only hash # pair the array contained # - we dereference $item[2], getting an array of arrays, and # then we use map to further unroll the key/value pairs # # We then bubble up the outcome $return = { @{ $item[1] } , map( @$_ , @{ $item[2] } ) } ; } PikeMappingPair: QuotedString ':' PikeValue { # $item[1] is a scalar (QuotedString) # $item[3] is a PikeValue, hence: # - a reference to an array or hash (if Pikevalue ~ PikeStructure) # - a scalar (if PikeValue ~ QuotedString or Number) # We throw them up together as a single entity: a reference to an +array $return = [ $item[1], $item[3] ] ; } PikeMappingFurtherPair: PikeStructureSeparator PikeMappingPair { # $item[2] is a PikeMappingPair, hence a reference to an array # containing a QuotedString (the first) and a PikeValue, hence: # - a reference to an array or hash (if Pikevalue ~ PikeStructure) # - a scalar (if PikeValue ~ QuotedString or Number) $return = $item[2] ; } PikeValue: PikeStructure | QuotedString | Number { # $item[1] is: # - a reference to an array or hash (if PikeStructure) # - a scalar (if QuotedString or Number) $return = $item[1] ; }

I would like to extend it so that I could use inline comments, e.g.:

parameter="value" ; like this parameter=({ "value1", ; but "value2", ; also "value3", ; like "value4", }) ; this

I tried a few solutions, with the only result to make the parser fail, or have one inline comment swallowing far more than it should... Any suggestions?

Thanks in advance!

Ciao!
--bronto


In theory, there is no difference between theory and practice. In practice, there is.

In reply to Extending a Parse::RecDescent grammar for inline comments by bronto

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2024-04-26 00:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found