comment on

Hello there

I need some help with Parse::RecDescent. I successfully created a grammar to parse a pseudo-ini file (more on the format in a second), but I would like to expand it to parse inline comments, as well as single-line comments.

Disclaimer: I didn't choose the format, and I can't change it. The only thing I can do with it is to parse it.

The file has section declarations like this:

[section_name]

and key/value associations like this:

    parameter=value
    values="may also be quoted"
[download]

It can have comment lines

; like this

and it allows for blank lines.

it admits multiple assignments to the same parameter, resulting in an array of values for that parameter:

    parameter=value1
    parameter=value2
    parameter=value3
    
    ; that would result in parameter = ( value1, value2, value3 )
[download]

a parameter could be assigned a full Pike multivalue data structure, that is an array:

parameter=({ value1, value2, value3 })

or a mapping (key/value pairs):

parameter=([ "key1" : "value1", "key2" : "value2" ])

these structures could be multilined and nested!!! As in:

    parameter=({
                ([
                    "key1" : "value1",
                    "key2" : ({ "array", "value" }),
                ]),
                "second element of this array",
                ({
                    "and here is another array",
                    ({
                        "with another one nested",
                    }),
                    ([
                        "that" : "contains",
                        "one"  : "more",
                        "hash" : "value",
                    ]),
                }),
                "Hooray!",
              })
[download]

As said, I made a grammar that correctly parses a file 350kb big (slow, but works well):

# $Id: g3.txt,v 1.14 2010/07/23 12:41:24 bronto Exp bronto $

AsIni: Line(s?) /\Z/

Line:    CommentLine
        | BlankLine
        | SectionDeclaration
        | AssignmentLine
        | <error>


CommentLine:    <skip: q{}> /^\s*/m ';' /.*$/m
  {
    print STDERR qq{\tSkipping comment: $item[4]\n} ;
  }

BlankLine:    <skip: q{}> /^\s+$/m
  {
    print STDERR qq{\tSkipping blank line\n}
  }

SectionDeclaration:    '[' /[^\]]+/ ']'
  {
    print STDERR qq{In section "$item[2]"\n} ;

    my $sectionname = $item[2] ;

    $AsIni::section = $sectionname ;
  }

AssignmentLine:    Parameter '=' Value(?)
  {
    my $distvalue = $item[3] ;
    my $parmname  = $item[1] ;
    my $paramvalue ;

    ( $paramvalue ) = @$distvalue ;

    if ( not exists $AsIni::node{$AsIni::section}{$parmname} ) {
      $AsIni::node{$AsIni::section}{$parmname} = [] ;
    }

    # Get a reference to the current array of values for this paramete
+r
    # in $current
    my $current = $AsIni::node{$AsIni::section}{$parmname} ;

    # We can update this safely, since we are using the reference
    push @$current,$paramvalue ;

  }

Parameter:    /\w[\w\s-]*/
  {
    $return = $item[1] ;
  }

Value:        PikeStructure    | ValueString
  {
    $return = $item[1] ;
  }

ValueString:    QuotedString    | UnquotedString
  {
    $return = $item[1] ;
  }

QuotedString:    '"' /[^"]+/ '"'
  {
    $return = $item[2] ;
  }

UnquotedString:    /.+/
  {
    $return = $item[1] ;
  }

# This rule matches a number, but rejects null-length results
Number:    /[+-]?\d*(\.\d+)?/ <reject: $item[1] eq ''>
  {
    $return = $item[1] ;
  }

PikeStructure:    PikeArray    | PikeMapping
  {
    # $item[1] is a reference to an array (PikeArray) or hash
    # (PikeMapping). We bubble it up as is
    $return = $item[1] ;
  }

PikeArray:    '({' PikeArrayContent '})'
  {
    # $item[2] is a PikeArrayContent, and since PikeArrayContent
    # bubbles up a reference to an array of PikeValues, this should be
    # an array reference that we can safely bubble up as is.
    $return = $item[2] ;
  }

PikeMapping:    '([' PikeMappingContent '])'
  {
    # $item[2] is a PikeMappingContent, and since PikeMappingContent
    # bubbles up an hash reference, this should be an hash reference t
+hat we
    # can safely bubble up as is.
    $return = $item[2] ;
  }

PikeStructureSeparator:    ','

PikeArrayContent:    PikeArraySequence(?) PikeStructureSeparator(?)
  {
    # $item[1] comes from a repetition of PikeArraySequence,
    # so it is a reference to an array of 0 or 1 PikeArraySequence.
    # In turn, PikeArraySequence is a reference to an array of
    # PikeValue's. We don't want to change the PikeValue's but we need
    # to unroll $item[1] before bubbling it up.
    ( $return ) = @{ $item[1] } ;
  }

PikeArraySequence:    PikeValue PikeArrayFurtherValue(s?)
  {
    # $item[1] is a PikeValue, hence:
    # - a reference to an array or hash (if PikeStructure)
    # - a scalar (if QuotedString or Number)
    #
    # $item[2] comes from a repetition of PikeArrayFurtherValue,
    # so it is a reference to an array of 0 or 1 PikeArrayFurtherValue
+.
    # In turn, PikeArrayFurtherValue just returns a PikeValue. So,
    # we actually don't want to change $item[1], but we need to
    # unroll $item[2] before returning it. Actually, we return an
    # array reference with the whole thing.
    $return = [ $item[1], @{ $item[2] } ] ;
  }

PikeArrayFurtherValue:    PikeStructureSeparator PikeValue
  {
    # $item[2] is a PikeValue, hence:
    # - a reference to an array or hash (if PikeStructure)
    # - a scalar (if QuotedString or Number)
    # We bubble it up as is.
    $return = $item[2] ;
  }

PikeMappingContent:    PikeMappingSequence(?) PikeStructureSeparator(?
+)
  {
    # Since we have a repetition here, $item[1] is a reference to an
    # array which may contain 0 or 1 PikeMappingSequence's.
    # In turn, PikeMappingSequence returns an hash reference.
    # So, if we want the hash reference to bubble up, we have to
    # unwrap it and return it as is.
    
    ( $return ) = @{ $item[1] } ;
  }

PikeMappingSequence:    PikeMappingPair PikeMappingFurtherPair(s?)
  {
    # $item[1] is a PikeMappingPair, hence a reference to an array
    # of two elements: a string and a PikeValue, that is:
    # - a reference to an array or hash (if Pikevalue ~ PikeStructure)
    # - a scalar (if PikeValue ~ QuotedString or Number)
    #
    # $item[2] has a repetition, so it is a reference to an array of
    # PikeMappingFurtherPair's. Since PikeMappingFurtherPair just
    # returns a PikeMappingPair (see $item[1]), then $item[2] is
    # a reference to an array where each element is, in turn, a
    # reference to an array of two elements.
    #
    # Since we are going to return an hash here, we create a reference
    # to an hash; to correctly unroll the values of $item[1] and
    # $item[2] we:
    # - simply dereference $item[1], hence unrolling the only hash
    #   pair the array contained
    # - we dereference $item[2], getting an array of arrays, and
    #   then we use map to further unroll the key/value pairs
    #
    # We then bubble up the outcome
    $return = { @{ $item[1] } , map( @$_ , @{ $item[2] } ) } ;
  }

PikeMappingPair:    QuotedString ':' PikeValue
  {
    # $item[1] is a scalar (QuotedString)
    # $item[3] is a PikeValue, hence:
    # - a reference to an array or hash (if Pikevalue ~ PikeStructure)
    # - a scalar (if PikeValue ~ QuotedString or Number)
    # We throw them up together as a single entity: a reference to an 
+array
    $return = [ $item[1], $item[3] ] ;
  }

PikeMappingFurtherPair:    PikeStructureSeparator PikeMappingPair
  {
    # $item[2] is a PikeMappingPair, hence a reference to an array
    # containing a QuotedString (the first) and a PikeValue, hence:
    # - a reference to an array or hash (if Pikevalue ~ PikeStructure)
    # - a scalar (if PikeValue ~ QuotedString or Number)
    $return = $item[2] ;
  }

PikeValue:        PikeStructure | QuotedString | Number
  {
    # $item[1] is:
    # - a reference to an array or hash (if PikeStructure)
    # - a scalar (if QuotedString or Number)
    $return = $item[1] ;
  }
[download]

I would like to extend it so that I could use inline comments, e.g.:

    parameter="value" ; like this
    
    parameter=({ "value1",     ; but
                 "value2",     ; also
                 "value3",     ; like
                 "value4", })  ; this
[download]

I tried a few solutions, with the only result to make the parser fail, or have one inline comment swallowing far more than it should... Any suggestions?

Thanks in advance!

Ciao!
--bronto

In theory, there is no difference between theory and practice. In practice, there is.

In reply to Extending a Parse::RecDescent grammar for inline comments by bronto

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


There's more than one way to do things
	PerlMonks