more useful options | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
So, I've been happily learning how to use grammars for parsing with Parse::RecDescent, and I've been very pleased with it's power and flexibility so far... but I'm stumbling over a problem that for the life of me, I can't understand why it's happening!
I highly doubt that this could be a bug in PRD - it's used by too many people... but even the most bare code is demonstrating this frustrating problem: Basically, it's this: Changing the prefix pattern has NO effect! If I print out $skip it shows that it is set as expected, but the behavior of PRD does not change from the default. This happens whether I am using a skip: directive, setting $skip from within an Action, or setting $Parse::RecDescent::skip from outside the grammar code. Here's a little demonstration of what I'm getting... Code like this:
Outputs this:
I'm pretty certain it's not a problem with the regexes I'm using because when I do something like this instead:
I get this output:
Update:As I suspected, the "skip" or "terminal prefix" functionality is *not* broken... but it is not quite as DWIMmy as I was expecting with regards to how the regular expression specified is used.I still don't think I understand the subtle details, but as far as I can tell, one should keep in mind that the skip regex (aka terminal prefix), is matched ONLY ONCE. Therefore, one probably should surround the whole thing with a parenthesis and asterisk to ensure *everything* one wants to skip will be consumed in *one pass* To further show what I mean, here is one of the many non-working regexes that brought me here: /(?: \# .*? \n? | \s* )?/msx It will match only ONE INSTANCE of a comment or repeated whitespace. My example text has several adjoining instances of comments and whitespace, and only the first match was being consumed! Here is the regex that does what I want: /(?: \# .*? \n | \s )*/msx As you can see, it consumes ALL Comments AND whitespace until nothing matches. SMALL change, BIG difference! I now have this working the way I want, by assigning it to $skip in the "start-up actions": $skip = '(?msx: \# .*? \n | \s )*' This has been another fun and edifying expedition, and if anyone reading this has any additional questions, I am happy to share whatever meager knowledge I have gained :)
|
|