Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Text::CSV - parsing

by sphalaris (Novice)
on Mar 14, 2017 at 17:51 UTC ( [id://1184575]=perlquestion: print w/replies, xml ) Need Help??

sphalaris has asked for the wisdom of the Perl Monks concerning the following question:

O Munificent Ones!

The two lines of data emanate from a MySql CSV/dump, which I'm trying to prepare for insertion into an SQLite table. (The import function on SQlitebrowser doesn't parse the dump correctly).

As you can see the CSV dump contains a quoted text field, which, unfortunately, sometimes contains the ", " separator of the CSV line.

We are told not to try to parse CSV ourselves, but to get Text::CSV to do it....but that module, as I use it, doesn't cope with this either. Have I overlooked some attribute which is supposed to deal with this? Any suggestions gratefully received.

use strict; use Text::CSV; my @lines=<DATA>; my $csv = Text::CSV->new ({sep_char => ', ' }); my @AoA; for (@lines) { if (/^\((.+?)\).?/){my $con=$1; if ($csv->parse($con)) { my @fields = $csv->fields(); push @AoA,[@fields]; } else { warn "Line could not be parsed: $_\n"; } } } $"="//";for (@AoA){print "no_fields: $#$_ : @$_\n";} __DATA__ (101, '1997-02-25', 'S1', 31.00, NULL, 0.00, 'this becomes two fields, + so no go', 5.11), (102, '1998-03-26', 'S1', 31.00, NULL, 0.00, 'this will remain one fie +ld', 6.11),

Replies are listed 'Best First'.
Re: Text::CSV - parsing
by Eily (Monsignor) on Mar 14, 2017 at 18:06 UTC

    Actually Text::CSV handles quoted fields pretty well, but the default quote character is ". See the quote_char section of the Text::CSV. Note that the sep_char is a single character though, so spaces are not removed, which would make 101, '1997' invalid CSV, because the quote_char should be the next thing after the sep_char. To allow spaces, set allow_whitespace to 1:

    use strict; use Text::CSV; use Data::Dumper; my @lines=<DATA>; my $csv = Text::CSV->new ({sep_char => ',' , quote_char => "'", allow +_whitespace => 1}); my @AoA; for (@lines) { if (/^\((.+?)\).?/){my $con=$1; if ($csv->parse($con)) { my @fields = $csv->fields(); push @AoA,[@fields]; } else { warn "Line could not be parsed: $_\n"; } } } print Dumper \@AoA; __DATA__ (101, '1997-02-25', 'S1', 31.00, NULL, 0.00, 'this becomes two fields, + so no go', 5.11), (102, '1998-03-26', 'S1', 31.00, NULL, 0.00, 'this will remain one fie +ld', 6.11),
    $VAR1 = [ [ '101', '1997-02-25', 'S1', '31.00', 'NULL', '0.00', 'this becomes two fields, so no go', '5.11' ], [ '102', '1998-03-26', 'S1', '31.00', 'NULL', '0.00', 'this will remain one field', '6.11' ] ];

      Good post, just a note: this will not work right if the quoted strings happen to be broken onto two lines. In that case, Text::CSV's getline is better (see the section "Embedded newlines" in the doc):

      use warnings; use strict; use Text::CSV; use Data::Dump; my $csv = Text::CSV->new ({ binary=>1, auto_diag=>2, quote_char=>"'", allow_whitespace=>1 }); while ( my $row = $csv->getline( \*DATA ) ) { dd $row; } $csv->eof; __DATA__ (101, '1997-02-25', 'S1', 31.00, NULL, 0.00, 'this becomes two fields, so no go', 5.11), (102, '1998-03-26', 'S1', 31.00, NULL, 0.00, 'this will remain one field', 6.11),

      Output:

      [ "(101", "1997-02-25", "S1", "31.00", "NULL", "0.00", "this becomes two fields,\nso no go", "5.11)", "", ] [ "(102", "1998-03-26", "S1", "31.00", "NULL", "0.00", "this\nwill remain one field", "6.11)", "", ]

        Well, this doesn't deal with the parentheses properly, so this would make your ++solution a better one then. You don't have to worry about embedded newlines, enclosing parentheses, the non default quote character or extra white space. And parsing DB data as SQL makes sense :)

Re: Text::CSV - parsing
by haukex (Archbishop) on Mar 14, 2017 at 18:10 UTC

    Eily has already shown you how to set the options to Text::CSV to parse your data. I'd like to show an alternative: Since it's SQL*, use an SQL parser! Here, I'm using SQL::Statement:

    use warnings; use strict; use SQL::Statement; use Data::Dump; my $sql = <<'END_SQL'; INSERT INTO foo VALUES (101, '1997-02-25', 'S1', 31.00, NULL, 0.00, 'this becomes two fields, + so no go', 5.11), (102, '1998-03-26', 'S1', 31.00, NULL, 0.00, 'this will remain one fie +ld', 6.11) END_SQL my $parser = SQL::Parser->new(); $parser->{RaiseError}=1; my $stmt = SQL::Statement->new($sql,$parser); dd $stmt->row_values; __END__ ( [ 101, "1997-02-25", "S1", "31.00", undef, "0.00", "this becomes two fields, so no go", 5.11, ], [ 102, "1998-03-26", "S1", "31.00", undef, "0.00", "this will remain one field", 6.11, ], )

    Update: * Hmm, at least it looks more like SQL than CSV to me, although you did say it's a "MySql CSV/dump".

Re: Text::CSV - parsing
by huck (Prior) on Mar 14, 2017 at 18:07 UTC
Re: Text::CSV - parsing
by Tux (Canon) on Mar 15, 2017 at 14:01 UTC

    Recentish Text::CSV (and Text::CSV_XS) can deal with separation wider than a single character:

    use 5.18.2; use Text::CSV; use Data::Peek; my $csv = Text::CSV->new ({ sep => ", ", quote => "'", auto_diag => 1 + }); while (my $row = $csv->getline (*DATA)) { DDumper $row; } __DATA__ (101, '1997-02-25', 'S1', 31.00, NULL, 0.00, 'this becomes two fields, + so no go', 5.11), (102, '1998-03-26', 'S1', 31.00, NULL, 0.00, 'this will remain one fie +ld', 6.11),
    perl test.pl [ '(101', '1997-02-25', 'S1', '31.00', 'NULL', '0.00', 'this becomes two fields, so no go', '5.11),' ] [ '(102', '1998-03-26', 'S1', '31.00', 'NULL', '0.00', 'this will remain one field', '6.11),' ]

    But as you want to strip leading ( and trailing ) I think the suggested use of SQL::Statement is fine!

    As a side note, newer Text::CSV and Text::CSV_XS can put the while file into the @AoA in one single statement with a filter to clean up the parens:

    use Text::CSV qw( csv ); use Data::Peek; my $aoa = csv (in => *DATA, sep => ", ", quo => "'", on_in => sub { $_[1][0] =~ s/^\(//; $_[1][-1] =~ s/\),?\s*$//; }); DDumper $aoa;
    [ [ 101, '1997-02-25', 'S1', '31.00', 'NULL', '0.00', 'this becomes two fields, so no go', '5.11' ], [ 102, '1998-03-26', 'S1', '31.00', 'NULL', '0.00', 'this will remain one field', '6.11' ] ]

    Enjoy, Have FUN! H.Merijn
Re: Text::CSV - parsing
by sphalaris (Novice) on Mar 14, 2017 at 18:58 UTC

    Wow - three super replies at the speed of light: many thanks all.

    I thought I'd tried the quote_char option, but perhaps not correctly, and certainly not the whitespace, and didn't appreciate the point about single char. separators.

    SQL::Statement looks interesting, especially if it can format "create" and "insert": it was disappointing to find that sqlitebrowser doesn't get any sense out of the mysql dump (from phpmyadmin).

    Thanks again!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1184575]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (4)
As of 2024-04-19 00:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found