Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Re: split string by comma

by perlfan (Vicar)
on Jan 11, 2012 at 03:50 UTC ( #947274=note: print w/replies, xml ) Need Help??

in reply to split string by comma

I wouldn't use a split. If you're expecting four values separated by comma's in the format, then you could do something like:
$line =~ m/^(.+),"(.+)",(.+),(.+)$/;
As other have suggested, however, you may what to check out the Perl module for CSV.

Replies are listed 'Best First'.
Re^2: split string by comma
by Tux (Abbot) on Jan 11, 2012 at 07:13 UTC

    That regular expression is way too posessive. Think about how that would parse

    1,"foo",2,"bar, joy",3,3.14,pi,π

    Correct regular expressions have been posted in this thead, but when dealing with real CSV data (what about embedded newlines?), you will most likely end up with failure eventually when sticking to split or regular expressions. Please seriously consider using Text::CSV_XS or Text::CSV (which will use Text::CSV_XS when installed) and be done with it.

    Another thing seldom considered by US users is that the "." in those "values" is locale dependent. Consider what will happen if 3623494.92 is printed as 3,623,494.92 or printed/exported in Dutch local using both radix sep and triad sep from the locale. It will export as "3.623.494,92". Oh, the horror in "fixing" all those regular expressions :)

    Enjoy, Have FUN! H.Merijn
      In order to avoid failure with embedded newlines (or your other record-separator of choice), I use this:
      my $old_INPUT_RECORD_SEPARATOR = $/; $/ = $self->record_delimiter; open (DELIMFILE, '<', $filename) or (Carp::confess("Cannot open fi +le [$filename]: $!")); my $record; while (<DELIMFILE>) { chomp; $record = $_; # If a line contains an odd amount of doublequotes ("), then w +e'll need to continue reading until we find another line that contain +s an odd amount of doublequotes. # This is in order to catch fields that contain recordseparato +rs (but are encased in ""'s). if (grep ($_ eq '"', split ('', $_)) % 2 == 1) { # Keep reading data and appending to $record until we find + another line with an odd number of doublequotes. while (<DELIMFILE>) { $record .= $_; if (grep ($_ eq '"', split ('', $_)) % 2 == 1) { last; + } } } ## end if (grep ($_ eq '"', split...)) push (@{$ar_returnvalue}, ReadRecord($self, $record)); } ## end while (<DELIMFILE>) close (DELIMFILE); $/ = $old_INPUT_RECORD_SEPARATOR;
      And ReadRecord uses a regex to consume the string field by field:
      my $field_value; my $delimiter = $self->field_delimiter; while ($inputstring) { undef $field_value; if ($inputstring =~ /^"/) { $field_value = $inputstring; if ($inputstring =~ /^"(([^"]|"")+)"(?:[$delimiter]|$)/p) { ($field_value, $inputstring) = ($1, ${^POSTMATCH}); # Unescape escaped quotes $field_value =~ s/""/"/g; } else { Carp::confess("Parsing error with remaining data [$inputst +ring]"); } } else { $field_value = $inputstring; if ($inputstring =~ /^([^$delimiter"]*)(?:[$delimiter]|$)/p) { ($field_value, $inputstring) = ($1, ${^POSTMATCH}); } } ## end else [ if ($inputstring =~ /^"/)] }
      This conforms to RFC 4180 :)
Re^2: split string by comma
by Anonymous Monk on Jan 11, 2012 at 04:45 UTC
    Thank you. That worked like a charm!!!
    Thank you much.

      Just to close things up ... what was “that?”   What approach worked for you?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://947274]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (1)
As of 2020-10-25 06:04 GMT
Find Nodes?
    Voting Booth?
    My favourite web site is:

    Results (249 votes). Check out past polls.