Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

split on comma-separated fields, where a field may have commas inside quotes

by argv (Pilgrim)
on Jun 10, 2009 at 00:08 UTC ( [id://770143]=perlquestion: print w/replies, xml ) Need Help??

argv has asked for the wisdom of the Perl Monks concerning the following question:

I've been away from perl for a while, and coming back, I'm finding a situation that should be brain-dead simple, but I can't seem to hack it.

I need to do a split on a line with comma-separated fields, but some fields have commas in them, and are therefore surrounded by quotes. Here's a prototype line:

IBM,INTL BUSINESS MACHINES,"2,500",$108.14,$270350.00,$1625.00,0.60%,$126200.00, +$144150.00,+114.22%

The "2,500" is obviously the number of shares. How do get split to not divide it into separate fields?

Replies are listed 'Best First'.
Re: split on comma-separated fields, where a field may have commas inside quotes
by toolic (Bishop) on Jun 10, 2009 at 00:20 UTC
    Text::CSV_XS is useful:
    use strict; use warnings; use Text::CSV_XS; use Data::Dumper; my $str = 'IBM,INTL BUSINESS MACHINES,"2,500",$108.14,$270350.00,$1625 +.00,0.60%,$126200.00, +$144150.00,+114.22%'; my $csv = Text::CSV_XS->new(); my $status = $csv->parse($str); my @columns = $csv->fields(); print Dumper(\@columns); __END__ $VAR1 = [ 'IBM', 'INTL BUSINESS MACHINES', '2,500', '$108.14', '$270350.00', '$1625.00', '0.60%', '$126200.00', ' +$144150.00', '+114.22%' ];
Re: split on comma-separated fields, where a field may have commas inside quotes
by ikegami (Patriarch) on Jun 10, 2009 at 00:11 UTC
Re: split on comma-separated fields, where a field may have commas inside quotes
by JavaFan (Canon) on Jun 10, 2009 at 00:45 UTC
    Don't split, collect.
    @chunks = $str =~ /[^,"]*(?:"[^"]*"[^,"]*)*/g;
      This sub-thread effectively underscores why it's such a Good Idea to use modules. There are numerous edge cases that make this a difficult problem. Chief among them: what about empty fields? Certainly they should be possible, so you can't just grep them out. But you don't want to introduce them, either.

      Then there's the issue about quoting quotes, which wasn't mentioned in this problem, but would probably come up eventually in any real-world case that gets used much. And reporting errors on malformed lines.

      That said, I've got a regex that at least seems to deal with the empty fields properly:

      /(?:^|,)((?:"[^"]*"|[^",]?)+)/g

      Caution: Contents may have been coded under pressure.
      Don't split, collect.

      I like that; I'll try to remember it.

      But the first asterisk yields empty chunks; fixed with a plus:

      @chunks = $str =~ /[^,"]+(?:"[^"]*"[^,"]*)*/g;
        Putting a + there instead of a * means that
        "foo","bar"
        is split into a single element:
        foo","bar
        which is highly unlikely to be wanted.
        But the first asterisk yields empty chunks
        grep will solve that problem easily:
        @chunks = grep {$_} ($str =~ /[^,"]*(?:"[^"]*"[^,"]*)*/g);

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: split on comma-separated fields, where a field may have commas inside quotes
by repellent (Priest) on Jun 10, 2009 at 21:18 UTC
    I'd recommend Text::xSV. Take advantage of someone else's good work!

    The problem may seem brain-dead simple, but it isn't. There are lots of edge cases to look out for.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://770143]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-03-29 02:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found