Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Splitting a line on just commas

by deMize (Monk)
on Jun 14, 2010 at 17:34 UTC ( [id://844673]=note: print w/replies, xml ) Need Help??


in reply to Splitting a line on just commas

Response: I'd go the Text::CSV route, but this might help get you started
use strict; sub main{ my $text = qq{a,b,"hey, you","str1, str2, str3",end}; print "Input: $text\n\n"; # Split the delimiters my @values = split( /(?:\,|(\".*?\"))/ , $text); # Remove the created blanks @values = grep{$_ ne ''} @values; # Output foreach (0..$#values){ print "$_: $values[$_] \n"; } } main();
Output:
Input: a,b,"hey, you","str1, str2, str3",end 0: a 1: b 2: "hey, you" 3: "str1, str2, str3" 4: end


Thoughts: I haven't really thought why the blanks are being created - if you take away the grep, you'll see what I'm talking about. I still advise using Text::CSV because using this grep method will remove wanted blanks. Therefore, the above code has structural integrity problems.

Example: a,b,,d,e
You probably really want that space holder there if you're going to be inserting this into a database. The grep would remove it because it has a blank string value ("").


Demize

Replies are listed 'Best First'.
Re^2: Splitting a line on just commas
by ikegami (Patriarch) on Jun 14, 2010 at 17:39 UTC

    Ouch! Misusing split, which you attempt to fix by filtering out empty strings, which leads to warnings and the removal of empty fields.

    Thing being separated vvvvvvv /(?:\,|(\".*?\"))/ ^^ Separator

    How are those two things on equal footing?

      Response: I was about to say, you might want to remove all the undefined created by the unmatched parens, before removing the blank fields.
      @values = grep{defined} @values; @values = grep{$_ ne ''} @values;
      or
      @values = grep{defined && $_ ne ''} @values;
      Again, I would not use this method. It's not good to remove blank string values. As for the equal footing, would this be any less equal: /(?:\,)|(\".*?\")/

      Update: I did forget to include the trailing comma after the quotes, but I still wouldn't use it: /\,|(?:(\".*?\")\,)/


      Demize
        The point is that there's no point if removing the extra stuff if you extract the fields right in the first place. It's already been shown how to do that.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://844673]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-18 20:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found