sphalaris has asked for the wisdom of the Perl Monks concerning the following question:
O Munificent Ones!
The two lines of data emanate from a MySql CSV/dump, which I'm trying to prepare for insertion into an SQLite table. (The import function on SQlitebrowser doesn't parse the dump correctly).
As you can see the CSV dump contains a quoted text field, which, unfortunately, sometimes contains the ", " separator of the CSV line.
We are told not to try to parse CSV ourselves, but to get Text::CSV to do it....but that module, as I use it, doesn't cope with this either. Have I overlooked some attribute which is supposed to deal with this?
Any suggestions gratefully received.
use strict; use Text::CSV;
my @lines=<DATA>;
my $csv = Text::CSV->new ({sep_char => ', ' });
my @AoA;
for (@lines) { if (/^\((.+?)\).?/){my $con=$1;
if ($csv->parse($con)) {
my @fields = $csv->fields(); push @AoA,[@fields];
} else { warn "Line could not be parsed: $_\n"; }
}
}
$"="//";for (@AoA){print "no_fields: $#$_ : @$_\n";}
__DATA__
(101, '1997-02-25', 'S1', 31.00, NULL, 0.00, 'this becomes two fields,
+ so no go', 5.11),
(102, '1998-03-26', 'S1', 31.00, NULL, 0.00, 'this will remain one fie
+ld', 6.11),
Re: Text::CSV - parsing
by Eily (Monsignor) on Mar 14, 2017 at 18:06 UTC
|
Actually Text::CSV handles quoted fields pretty well, but the default quote character is ". See the quote_char section of the Text::CSV. Note that the sep_char is a single character though, so spaces are not removed, which would make 101, '1997' invalid CSV, because the quote_char should be the next thing after the sep_char. To allow spaces, set allow_whitespace to 1:
use strict; use Text::CSV;
use Data::Dumper;
my @lines=<DATA>;
my $csv = Text::CSV->new ({sep_char => ',' , quote_char => "'", allow
+_whitespace => 1});
my @AoA;
for (@lines) { if (/^\((.+?)\).?/){my $con=$1;
if ($csv->parse($con)) {
my @fields = $csv->fields(); push @AoA,[@fields];
} else { warn "Line could not be parsed: $_\n"; }
}
}
print Dumper \@AoA;
__DATA__
(101, '1997-02-25', 'S1', 31.00, NULL, 0.00, 'this becomes two fields,
+ so no go', 5.11),
(102, '1998-03-26', 'S1', 31.00, NULL, 0.00, 'this will remain one fie
+ld', 6.11),
$VAR1 = [
[
'101',
'1997-02-25',
'S1',
'31.00',
'NULL',
'0.00',
'this becomes two fields, so no go',
'5.11'
],
[
'102',
'1998-03-26',
'S1',
'31.00',
'NULL',
'0.00',
'this will remain one field',
'6.11'
]
];
| [reply] [d/l] [select] |
|
Good post, just a note: this will not work right if the quoted strings happen to be broken onto two lines. In that case, Text::CSV's getline is better (see the section "Embedded newlines" in the doc):
use warnings;
use strict;
use Text::CSV;
use Data::Dump;
my $csv = Text::CSV->new ({ binary=>1, auto_diag=>2,
quote_char=>"'", allow_whitespace=>1 });
while ( my $row = $csv->getline( \*DATA ) ) {
dd $row;
}
$csv->eof;
__DATA__
(101, '1997-02-25', 'S1', 31.00, NULL, 0.00, 'this becomes two fields,
so no go', 5.11),
(102, '1998-03-26', 'S1', 31.00, NULL, 0.00, 'this
will remain one field', 6.11),
Output:
[
"(101",
"1997-02-25",
"S1",
"31.00",
"NULL",
"0.00",
"this becomes two fields,\nso no go",
"5.11)",
"",
]
[
"(102",
"1998-03-26",
"S1",
"31.00",
"NULL",
"0.00",
"this\nwill remain one field",
"6.11)",
"",
]
| [reply] [d/l] [select] |
|
Well, this doesn't deal with the parentheses properly, so this would make your ++solution a better one then. You don't have to worry about embedded newlines, enclosing parentheses, the non default quote character or extra white space. And parsing DB data as SQL makes sense :)
| [reply] |
Re: Text::CSV - parsing
by haukex (Archbishop) on Mar 14, 2017 at 18:10 UTC
|
Eily has already shown you how to set the options to Text::CSV to parse your data. I'd like to show an alternative: Since it's SQL*, use an SQL parser! Here, I'm using SQL::Statement:
use warnings;
use strict;
use SQL::Statement;
use Data::Dump;
my $sql = <<'END_SQL';
INSERT INTO foo VALUES
(101, '1997-02-25', 'S1', 31.00, NULL, 0.00, 'this becomes two fields,
+ so no go', 5.11),
(102, '1998-03-26', 'S1', 31.00, NULL, 0.00, 'this will remain one fie
+ld', 6.11)
END_SQL
my $parser = SQL::Parser->new();
$parser->{RaiseError}=1;
my $stmt = SQL::Statement->new($sql,$parser);
dd $stmt->row_values;
__END__
(
[
101,
"1997-02-25",
"S1",
"31.00",
undef,
"0.00",
"this becomes two fields, so no go",
5.11,
],
[
102,
"1998-03-26",
"S1",
"31.00",
undef,
"0.00",
"this will remain one field",
6.11,
],
)
Update: * Hmm, at least it looks more like SQL than CSV to me, although you did say it's a "MySql CSV/dump". | [reply] [d/l] |
Re: Text::CSV - parsing
by huck (Prior) on Mar 14, 2017 at 18:07 UTC
|
while it is more common to see the double-quote used, a single can also be used like this
my $csv = Text::CSV->new ({sep_char => ',',quote_char => "'" });
http://search.cpan.org/~ishigaki/Text-CSV-1.91/lib/Text/CSV.pm#quote_char
note i also fixed your sep_char to only have one char
http://search.cpan.org/~ishigaki/Text-CSV-1.91/lib/Text/CSV.pm#sep_char | [reply] [d/l] |
Re: Text::CSV - parsing
by Tux (Canon) on Mar 15, 2017 at 14:01 UTC
|
use 5.18.2;
use Text::CSV;
use Data::Peek;
my $csv = Text::CSV->new ({ sep => ", ", quote => "'", auto_diag => 1
+ });
while (my $row = $csv->getline (*DATA)) {
DDumper $row;
}
__DATA__
(101, '1997-02-25', 'S1', 31.00, NULL, 0.00, 'this becomes two fields,
+ so no go', 5.11),
(102, '1998-03-26', 'S1', 31.00, NULL, 0.00, 'this will remain one fie
+ld', 6.11),
perl test.pl
[ '(101',
'1997-02-25',
'S1',
'31.00',
'NULL',
'0.00',
'this becomes two fields, so no go',
'5.11),'
]
[ '(102',
'1998-03-26',
'S1',
'31.00',
'NULL',
'0.00',
'this will remain one field',
'6.11),'
]
But as you want to strip leading ( and trailing ) I think the suggested use of SQL::Statement is fine!
As a side note, newer Text::CSV and Text::CSV_XS can put the while file into the @AoA in one single statement with a filter to clean up the parens:
use Text::CSV qw( csv );
use Data::Peek;
my $aoa = csv (in => *DATA, sep => ", ", quo => "'",
on_in => sub { $_[1][0] =~ s/^\(//; $_[1][-1] =~ s/\),?\s*$//; });
DDumper $aoa;
[
[ 101,
'1997-02-25',
'S1',
'31.00',
'NULL',
'0.00',
'this becomes two fields, so no go',
'5.11'
],
[ 102,
'1998-03-26',
'S1',
'31.00',
'NULL',
'0.00',
'this will remain one field',
'6.11'
]
]
Enjoy, Have FUN! H.Merijn
| [reply] [d/l] [select] |
Re: Text::CSV - parsing
by sphalaris (Novice) on Mar 14, 2017 at 18:58 UTC
|
Wow - three super replies at the speed of light: many thanks all.
I thought I'd tried the quote_char option, but perhaps not correctly, and certainly not the whitespace, and didn't appreciate the point about single char. separators.
SQL::Statement looks interesting, especially if it can format "create" and "insert": it was disappointing to find that sqlitebrowser doesn't get any sense out of the mysql dump (from phpmyadmin).
Thanks again!
| [reply] |
|
|