Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Re: Regex with malformed CSV files

by bluto (Curate)
on Jan 10, 2006 at 20:33 UTC ( #522309=note: print w/replies, xml ) Need Help??

in reply to Regex with malformed CSV files

I'd probably just document the fact that Outlook is broken and not supported, or do what jZed suggests. You may be able to do a better job if you parse the problem lines yourself and examine individual fields for clues on where you are at in a "line".

For example, most fields will probably not have embedded newlines in them; a zip code will probably not have embedded quotes or commas and will probably be short; email addresses will tend to have '@' in them; quoted fields will probably not strech 100's of characters; etc.

If you enforce some rules like this, your parser may be able to determine most of the time where it is. Of course, you could go stark raving mad in a futile effort trying to figure out the perfect ruleset...

Replies are listed 'Best First'.
Re^2: Regex with malformed CSV files
by Anonymous Monk on Jan 10, 2006 at 21:16 UTC
    well so far the above solution solved the malformed portion of outlook regarding newlines. to solve the problem of the embedded quotes i'm using...
    $line =~ s/(?<!,)"{1,2}(?!,)/""/g;
    basically just adds two quotes if it finds one, or two quotes that are not preceeded or followed by a (,) comma. This still won't solve a rule if there's one such as

    "my address is ",320 main street" virginia". but in all honesty... how often is it going to happen? :) thanks for the help everyone.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://522309]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (2)
As of 2021-10-22 22:47 GMT
Find Nodes?
    Voting Booth?
    My first memorable Perl project was:

    Results (86 votes). Check out past polls.