I need some help with Parse::RecDescent. I successfully created a grammar to parse a pseudo-ini file (more on the format in a second), but I would like to expand it to parse inline comments, as well as single-line comments.
Disclaimer: I didn't choose the format, and I can't change it. The only thing I can do with it is to parse it.
The file has section declarations like this:
[section_name]
and key/value associations like this:
parameter=value
values="may also be quoted"
It can have comment lines
; like this
and it allows for blank lines.
it admits multiple assignments to the same parameter, resulting in an array of values for that parameter:
parameter=value1
parameter=value2
parameter=value3
; that would result in parameter = ( value1, value2, value3 )
a parameter could be assigned a full Pike multivalue data structure, that is an array:
parameter=({ value1, value2, value3 })
or a mapping (key/value pairs):
parameter=([ "key1" : "value1", "key2" : "value2" ])
these structures could be multilined and nested!!! As in:
parameter=({
([
"key1" : "value1",
"key2" : ({ "array", "value" }),
]),
"second element of this array",
({
"and here is another array",
({
"with another one nested",
}),
([
"that" : "contains",
"one" : "more",
"hash" : "value",
]),
}),
"Hooray!",
})
As said, I made a grammar that correctly parses a file 350kb big (slow, but works well):
# $Id: g3.txt,v 1.14 2010/07/23 12:41:24 bronto Exp bronto $
AsIni: Line(s?) /\Z/
Line: CommentLine
| BlankLine
| SectionDeclaration
| AssignmentLine
| <error>
CommentLine: <skip: q{}> /^\s*/m ';' /.*$/m
{
print STDERR qq{\tSkipping comment: $item[4]\n} ;
}
BlankLine: <skip: q{}> /^\s+$/m
{
print STDERR qq{\tSkipping blank line\n}
}
SectionDeclaration: '[' /[^\]]+/ ']'
{
print STDERR qq{In section "$item[2]"\n} ;
my $sectionname = $item[2] ;
$AsIni::section = $sectionname ;
}
AssignmentLine: Parameter '=' Value(?)
{
my $distvalue = $item[3] ;
my $parmname = $item[1] ;
my $paramvalue ;
( $paramvalue ) = @$distvalue ;
if ( not exists $AsIni::node{$AsIni::section}{$parmname} ) {
$AsIni::node{$AsIni::section}{$parmname} = [] ;
}
# Get a reference to the current array of values for this paramete
+r
# in $current
my $current = $AsIni::node{$AsIni::section}{$parmname} ;
# We can update this safely, since we are using the reference
push @$current,$paramvalue ;
}
Parameter: /\w[\w\s-]*/
{
$return = $item[1] ;
}
Value: PikeStructure | ValueString
{
$return = $item[1] ;
}
ValueString: QuotedString | UnquotedString
{
$return = $item[1] ;
}
QuotedString: '"' /[^"]+/ '"'
{
$return = $item[2] ;
}
UnquotedString: /.+/
{
$return = $item[1] ;
}
# This rule matches a number, but rejects null-length results
Number: /[+-]?\d*(\.\d+)?/ <reject: $item[1] eq ''>
{
$return = $item[1] ;
}
PikeStructure: PikeArray | PikeMapping
{
# $item[1] is a reference to an array (PikeArray) or hash
# (PikeMapping). We bubble it up as is
$return = $item[1] ;
}
PikeArray: '({' PikeArrayContent '})'
{
# $item[2] is a PikeArrayContent, and since PikeArrayContent
# bubbles up a reference to an array of PikeValues, this should be
# an array reference that we can safely bubble up as is.
$return = $item[2] ;
}
PikeMapping: '([' PikeMappingContent '])'
{
# $item[2] is a PikeMappingContent, and since PikeMappingContent
# bubbles up an hash reference, this should be an hash reference t
+hat we
# can safely bubble up as is.
$return = $item[2] ;
}
PikeStructureSeparator: ','
PikeArrayContent: PikeArraySequence(?) PikeStructureSeparator(?)
{
# $item[1] comes from a repetition of PikeArraySequence,
# so it is a reference to an array of 0 or 1 PikeArraySequence.
# In turn, PikeArraySequence is a reference to an array of
# PikeValue's. We don't want to change the PikeValue's but we need
# to unroll $item[1] before bubbling it up.
( $return ) = @{ $item[1] } ;
}
PikeArraySequence: PikeValue PikeArrayFurtherValue(s?)
{
# $item[1] is a PikeValue, hence:
# - a reference to an array or hash (if PikeStructure)
# - a scalar (if QuotedString or Number)
#
# $item[2] comes from a repetition of PikeArrayFurtherValue,
# so it is a reference to an array of 0 or 1 PikeArrayFurtherValue
+.
# In turn, PikeArrayFurtherValue just returns a PikeValue. So,
# we actually don't want to change $item[1], but we need to
# unroll $item[2] before returning it. Actually, we return an
# array reference with the whole thing.
$return = [ $item[1], @{ $item[2] } ] ;
}
PikeArrayFurtherValue: PikeStructureSeparator PikeValue
{
# $item[2] is a PikeValue, hence:
# - a reference to an array or hash (if PikeStructure)
# - a scalar (if QuotedString or Number)
# We bubble it up as is.
$return = $item[2] ;
}
PikeMappingContent: PikeMappingSequence(?) PikeStructureSeparator(?
+)
{
# Since we have a repetition here, $item[1] is a reference to an
# array which may contain 0 or 1 PikeMappingSequence's.
# In turn, PikeMappingSequence returns an hash reference.
# So, if we want the hash reference to bubble up, we have to
# unwrap it and return it as is.
( $return ) = @{ $item[1] } ;
}
PikeMappingSequence: PikeMappingPair PikeMappingFurtherPair(s?)
{
# $item[1] is a PikeMappingPair, hence a reference to an array
# of two elements: a string and a PikeValue, that is:
# - a reference to an array or hash (if Pikevalue ~ PikeStructure)
# - a scalar (if PikeValue ~ QuotedString or Number)
#
# $item[2] has a repetition, so it is a reference to an array of
# PikeMappingFurtherPair's. Since PikeMappingFurtherPair just
# returns a PikeMappingPair (see $item[1]), then $item[2] is
# a reference to an array where each element is, in turn, a
# reference to an array of two elements.
#
# Since we are going to return an hash here, we create a reference
# to an hash; to correctly unroll the values of $item[1] and
# $item[2] we:
# - simply dereference $item[1], hence unrolling the only hash
# pair the array contained
# - we dereference $item[2], getting an array of arrays, and
# then we use map to further unroll the key/value pairs
#
# We then bubble up the outcome
$return = { @{ $item[1] } , map( @$_ , @{ $item[2] } ) } ;
}
PikeMappingPair: QuotedString ':' PikeValue
{
# $item[1] is a scalar (QuotedString)
# $item[3] is a PikeValue, hence:
# - a reference to an array or hash (if Pikevalue ~ PikeStructure)
# - a scalar (if PikeValue ~ QuotedString or Number)
# We throw them up together as a single entity: a reference to an
+array
$return = [ $item[1], $item[3] ] ;
}
PikeMappingFurtherPair: PikeStructureSeparator PikeMappingPair
{
# $item[2] is a PikeMappingPair, hence a reference to an array
# containing a QuotedString (the first) and a PikeValue, hence:
# - a reference to an array or hash (if Pikevalue ~ PikeStructure)
# - a scalar (if PikeValue ~ QuotedString or Number)
$return = $item[2] ;
}
PikeValue: PikeStructure | QuotedString | Number
{
# $item[1] is:
# - a reference to an array or hash (if PikeStructure)
# - a scalar (if QuotedString or Number)
$return = $item[1] ;
}
I tried a few solutions, with the only result to make the parser fail, or have one inline comment swallowing far more than it should... Any suggestions?