You surely *can* split the record in one go -
Version 1 - '\' escaped quotes:
use strict;
use Data::Dumper;
while (<DATA>)
{
chomp;
my @rec;
# was - foreach (split /"(.*,.*)"|,/) ...
foreach (split /"((?:\\"|.)*?)"|,/) { push @rec, $_ if $_ }
print Dumper(\@rec);
}
__DATA__
1,"Hello, world",This is good,2
121212,"Simpson, Bart",Springfield,"Roger"
121212,"2\" tape, \"white",springfield,"Roger"
121212,"Simpson \", Bart",Springfield,"Roger"
And the output is -
$VAR1 = [
'1',
'Hello, world',
'This is good',
'2'
];
$VAR1 = [
'121212',
'Simpson, Bart',
'Springfield',
'Roger'
];
$VAR1 = [
'121212',
'2\\" tape, \\"white',
'springfield',
'Roger'
];
$VAR1 = [
'121212',
'Simpson \\", Bart',
'Springfield',
'Roger'
];
Update: There was a minor flaw in the original solution, I did not search for escaped quotes inside the quote, here's the enhanced version.
Version 2 - '"' escaped quotes:
use strict;
use Data::Dumper;
while (<DATA>)
{
chomp;
my @rec;
foreach (split /"(.*?)(?:(?<!")"(?!")|(?<="")"(?!"))|,/)
{ s/""/"/g, push @rec, $_ if $_ }
print Dumper(\@rec);
}
__DATA__
1,"Hello, world",This is good,2
121212,"Simpson, Bart",Springfield,"Roger"
121212,"2"" tape, ""white",springfield,"Roger"
121212,"Simpson "", Bart",Springfield,"Roger"
121212,"2""",springfield,"Roger"
And output is -
$VAR1 = [
'1',
'Hello, world',
'This is good',
'2'
];
$VAR1 = [
'121212',
'Simpson, Bart',
'Springfield',
'Roger'
];
$VAR1 = [
'121212',
'2" tape, "white',
'springfield',
'Roger'
];
$VAR1 = [
'121212',
'Simpson ", Bart',
'Springfield',
'Roger'
];
$VAR1 = [
'121212',
'2"',
'springfield',
'Roger'
];
Update: Thanks to antirice to point out that the quotes are escaped by quote ("), not backslash (\) inside the quote.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.