Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: split on comma-separated fields, where a field may have commas inside quotes

by JavaFan (Canon)
on Jun 10, 2009 at 00:45 UTC ( [id://770151]=note: print w/replies, xml ) Need Help??


in reply to split on comma-separated fields, where a field may have commas inside quotes

Don't split, collect.
@chunks = $str =~ /[^,"]*(?:"[^"]*"[^,"]*)*/g;
  • Comment on Re: split on comma-separated fields, where a field may have commas inside quotes
  • Download Code

Replies are listed 'Best First'.
Re^2: split on comma-separated fields, where a field may have commas inside quotes
by Roy Johnson (Monsignor) on Jun 10, 2009 at 21:00 UTC
    This sub-thread effectively underscores why it's such a Good Idea to use modules. There are numerous edge cases that make this a difficult problem. Chief among them: what about empty fields? Certainly they should be possible, so you can't just grep them out. But you don't want to introduce them, either.

    Then there's the issue about quoting quotes, which wasn't mentioned in this problem, but would probably come up eventually in any real-world case that gets used much. And reporting errors on malformed lines.

    That said, I've got a regex that at least seems to deal with the empty fields properly:

    /(?:^|,)((?:"[^"]*"|[^",]?)+)/g

    Caution: Contents may have been coded under pressure.
Re^2: split on comma-separated fields, where a field may have commas inside quotes
by Anonymous Monk on Jun 10, 2009 at 13:47 UTC
    Don't split, collect.

    I like that; I'll try to remember it.

    But the first asterisk yields empty chunks; fixed with a plus:

    @chunks = $str =~ /[^,"]+(?:"[^"]*"[^,"]*)*/g;
      Putting a + there instead of a * means that
      "foo","bar"
      is split into a single element:
      foo","bar
      which is highly unlikely to be wanted.
      But the first asterisk yields empty chunks
      grep will solve that problem easily:
      @chunks = grep {$_} ($str =~ /[^,"]*(?:"[^"]*"[^,"]*)*/g);

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://770151]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (2)
As of 2024-04-20 05:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found