http://qs321.pair.com?node_id=522309


in reply to Regex with malformed CSV files

I'd probably just document the fact that Outlook is broken and not supported, or do what jZed suggests. You may be able to do a better job if you parse the problem lines yourself and examine individual fields for clues on where you are at in a "line".

For example, most fields will probably not have embedded newlines in them; a zip code will probably not have embedded quotes or commas and will probably be short; email addresses will tend to have '@' in them; quoted fields will probably not strech 100's of characters; etc.

If you enforce some rules like this, your parser may be able to determine most of the time where it is. Of course, you could go stark raving mad in a futile effort trying to figure out the perfect ruleset...

Replies are listed 'Best First'.
Re^2: Regex with malformed CSV files
by Anonymous Monk on Jan 10, 2006 at 21:16 UTC
    well so far the above solution solved the malformed portion of outlook regarding newlines. to solve the problem of the embedded quotes i'm using...
    $line =~ s/(?<!,)"{1,2}(?!,)/""/g;
    basically just adds two quotes if it finds one, or two quotes that are not preceeded or followed by a (,) comma. This still won't solve a rule if there's one such as

    "my address is ",320 main street" virginia". but in all honesty... how often is it going to happen? :) thanks for the help everyone.