Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

One-line CSV Parser

by BiffBaker (Novice)
on Apr 03, 2008 at 19:47 UTC ( [id://678257]=CUFP: print w/replies, xml ) Need Help??

How is this for quick parsing of standard CSV?
map {if (/^".*"$/) {s/\s*"\s*//g; push @fields, $_;} else {push @field +s, split /\s*,\s*/;}} split /,*("[^"]*"),*/, $line;

Replies are listed 'Best First'.
Re: One-line CSV Parser
by duelafn (Parson) on Apr 03, 2008 at 23:36 UTC

    No escaping:

    $line = 'bob,"I said \"foo\"",bar';

    Could be shorter (matches identically):

    @fields = map { /^"\s*(.*?)\s*"$/ ? $1 : split /\s*,\s*/ } split /,*("[^"]*"),*/, $line

    Update: This too:

    $line = 'bob,I said "foo",bar'

    I originally had the following, which shows a trick for putting multiple things in a ?: operation:

    @fields = map { /^".*"$/ ? do{s/\s*"\s*//g; $_} : split(/\s*,\s*/) } split /,*("[^"]*"),*/, $line

    Update 2: Hmm, that's what I get for not reading the RFC, non-counterexamples (counter-counterexamples?) (see other posts below)

    Good Day,
        Dean

Re: One-line CSV Parser
by idsfa (Vicar) on Apr 04, 2008 at 14:10 UTC

    CSV as defined by RFC 4180 does not "escape" double quotes with a backslash, but rather by an additional set of double quotes. Your parser fails to handle this format properly.

    CSV is hard.


    The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. — Cyrus H. Gordon
Re: One-line CSV Parser
by radiantmatrix (Parson) on Apr 08, 2008 at 16:16 UTC

    I much prefer

    use Text::CSV_XS; use IO::File; my $io = IO::File->new( $filename, '<' ) or die "Can't read $filename: + $!"; my $csv = Text::CSV_XS->new(); until ( $io->eof ) { my $row = $csv->getline($io)); # do something with the ARRAYref $row }

    It's not much longer, but provides all kinds of error handling, is easier to maintain (and easier to read), and handles all the ins and outs of escaping, etc. As idsfa says, "CSV is hard".

    Not only that, Text::CSV_XS is really fast to boot.

    Don't do stuff yourself that others have already done and tested thoroughly -- this is true Laziness.

    <radiant.matrix>
    Ramblings and references
    The Code that can be seen is not the true Code
    I haven't found a problem yet that can't be solved by a well-placed trebuchet
Re: One-line CSV Parser
by ww (Archbishop) on Apr 04, 2008 at 16:01 UTC

    From the cited RFC,

    If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote. For example:

    "aaa","b""bb","ccc"

    and

    ...from the preceeding para of the RFC ( http://tools.ietf.org/html/rfc4180#section-2 ):

    If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.

    Note, however that "as defined" may overstate the status of the document:

    This memo provides information for the Internet community. It does not specify an Internet standard of any kind.
    ... While there are various specifications and implementations for the CSV format (cites removed), there is no formal specification in existence, which allows for a wide variety of interpretations of CSV files. This section documents the format that seems to be followed by most implementations:

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://678257]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (4)
As of 2024-04-20 00:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found