Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: regular expression (search and destroy)

by Roger (Parson)
on Nov 12, 2003 at 21:09 UTC ( [id://306640]=note: print w/replies, xml ) Need Help??


in reply to regular expression (search and destroy)

You surely *can* split the record in one go -

Version 1 - '\' escaped quotes:
use strict; use Data::Dumper; while (<DATA>) { chomp; my @rec; # was - foreach (split /"(.*,.*)"|,/) ... foreach (split /"((?:\\"|.)*?)"|,/) { push @rec, $_ if $_ } print Dumper(\@rec); } __DATA__ 1,"Hello, world",This is good,2 121212,"Simpson, Bart",Springfield,"Roger" 121212,"2\" tape, \"white",springfield,"Roger" 121212,"Simpson \", Bart",Springfield,"Roger"
And the output is -
$VAR1 = [ '1', 'Hello, world', 'This is good', '2' ]; $VAR1 = [ '121212', 'Simpson, Bart', 'Springfield', 'Roger' ]; $VAR1 = [ '121212', '2\\" tape, \\"white', 'springfield', 'Roger' ]; $VAR1 = [ '121212', 'Simpson \\", Bart', 'Springfield', 'Roger' ];
Update: There was a minor flaw in the original solution, I did not search for escaped quotes inside the quote, here's the enhanced version.

Version 2 - '"' escaped quotes:
use strict; use Data::Dumper; while (<DATA>) { chomp; my @rec; foreach (split /"(.*?)(?:(?<!")"(?!")|(?<="")"(?!"))|,/) { s/""/"/g, push @rec, $_ if $_ } print Dumper(\@rec); } __DATA__ 1,"Hello, world",This is good,2 121212,"Simpson, Bart",Springfield,"Roger" 121212,"2"" tape, ""white",springfield,"Roger" 121212,"Simpson "", Bart",Springfield,"Roger" 121212,"2""",springfield,"Roger"
And output is -
$VAR1 = [ '1', 'Hello, world', 'This is good', '2' ]; $VAR1 = [ '121212', 'Simpson, Bart', 'Springfield', 'Roger' ]; $VAR1 = [ '121212', '2" tape, "white', 'springfield', 'Roger' ]; $VAR1 = [ '121212', 'Simpson ", Bart', 'Springfield', 'Roger' ]; $VAR1 = [ '121212', '2"', 'springfield', 'Roger' ];
Update: Thanks to antirice to point out that the quotes are escaped by quote ("), not backslash (\) inside the quote.

Replies are listed 'Best First'.
Re: Re: regular expression (search and destroy)
by antirice (Priest) on Nov 12, 2003 at 23:24 UTC

    If I recall correctly, you can have empty fields and the way you actually escape a double quote is as "" instead of \". In other words, these are allowed:

    1,"Hello world",,2 2,"""Hello world""",,2

    I also believe that if the field contains a double quote, then the entire field must be enclosed in double quotes so escaping rules could apply. There are more rules, but they're mostly allowed values and input record separator issues more than anything else, but that's just for saving and verification that the file you have is really CSV. I think you can read more about them at the end of the documentation here.

    Of course, this is only if you want valid CSV or CSV produced by programs such as Excel and whatnot.

    Hope this helps.

    antirice    
    The first rule of Perl club is - use Perl
    The
    ith rule of Perl club is - follow rule i - 1 for i > 1

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://306640]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (2)
As of 2024-04-24 13:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found