Recommendations for parsing invalid CSVby markjugg (Curate)
|on Apr 21, 2008 at 13:58 UTC||Need Help??|
markjugg has asked for the wisdom of the Perl Monks concerning the following question:
I need to parse invalid CSV in a situation where it's possible to get the source to generate valid CSV. The problem is that quoting isn't handled right, so this may appear in the file:
"call from "friend""
My best idea for handling it is to try to pre-parse it to make it valid, and then hand the result to Text::CSV_XS to finish the job. The logic for "fixing" it might be like this:
- If a quote character appears at the beginning or end of a line, or next to a comma, consider it a part of the file delimiter. Otherwise, consider it an internal quote and escape it properly.
I realize there are edge cases that wouldn't be handled by this logic, but it does seem like a case that difficult to correct perfectly automatically.
My hope is that someone might now of a CSV parsing tool that tolerates this particular kind of broken-ness.