![]() |
|
Clear questions and runnable code get the best and fastest answer |
|
PerlMonks |
Conditional continued matching with regexesby bart (Canon) |
on Feb 05, 2007 at 22:10 UTC ( #598431=perlquestion: print w/replies, xml ) | Need Help?? |
bart has asked for the wisdom of the Perl Monks concerning the following question:
I am trying to use a plain perl regex s/// to fix up the formatting of fields in a
CSV file, so that the real parser will no longer choke on it. The fields, separated by
semicolons, are formatted like this:
What I'm trying to do is to leave the quoted fields alone, replace the comma in numeric fields with ".", and drop the unquoted question mark. The basis of what I've been using looks like this — I've added extensive regex comment, describing what it does:
Now the part that I'm having some trouble with: I'm trying to add support for multiline records, thus containing newlines within quoted strings, but without reading in the whole data file at once. Now I can detect if a quoted string is still open by making the closing quote optional, and checking for its presence. The problem is: how do you continue parsing the same open string, until you find the first semicolon, on the next line? My idea was that, if the previous line was closed, the pattern should work as above, but if we were in a quoted field at the end, it should behave like: instead. Now how do you do that? I've tried experimenting with the, still marked as "highly experimental" after over 5 years, features of (?{CODE})but I don't quite get it, and I couldn't get it to work properly. Because of its "experimental nature" (it may be here to stay, but that doesn't mean it has been properly debugged), I'd like to avoid it, anyway. I've also though about using /"/g to skip any leading remainders of a quoted string, but s///g simply ignores \G. So... What would you do?
Back to
Seekers of Perl Wisdom
|
|