comment on

In order to avoid failure with embedded newlines (or your other record-separator of choice), I use this:

    my $old_INPUT_RECORD_SEPARATOR = $/;
    $/ = $self->record_delimiter;
    open (DELIMFILE, '<', $filename) or (Carp::confess("Cannot open fi
+le [$filename]: $!"));
    my $record;
    while (<DELIMFILE>) {
        chomp;
        $record = $_;
        # If a line contains an odd amount of doublequotes ("), then w
+e'll need to continue reading until we find another line that contain
+s an odd amount of doublequotes.
        # This is in order to catch fields that contain recordseparato
+rs (but are encased in ""'s).
        if (grep ($_ eq '"', split ('', $_)) % 2 == 1) {
            # Keep reading data and appending to $record until we find
+ another line with an odd number of doublequotes.
            while (<DELIMFILE>) {
                $record .= $_;
                if (grep ($_ eq '"', split ('', $_)) % 2 == 1) { last;
+ }
            }
        } ## end if (grep ($_ eq '"', split...))
        push (@{$ar_returnvalue}, ReadRecord($self, $record));
    } ## end while (<DELIMFILE>)
    close (DELIMFILE);
    $/ = $old_INPUT_RECORD_SEPARATOR;
[download]

And ReadRecord uses a regex to consume the string field by field:

my $field_value;
my $delimiter = $self->field_delimiter;
while ($inputstring) {
    undef $field_value;
    if ($inputstring =~ /^"/) {
        $field_value = $inputstring;
        if ($inputstring =~ /^"(([^"]|"")+)"(?:[$delimiter]|$)/p) {
            ($field_value, $inputstring) = ($1, ${^POSTMATCH});
            # Unescape escaped quotes
            $field_value =~ s/""/"/g;
        } else {
            Carp::confess("Parsing error with remaining data [$inputst
+ring]");
        }
    } else {
        $field_value = $inputstring;
        if ($inputstring =~ /^([^$delimiter"]*)(?:[$delimiter]|$)/p) {
            ($field_value, $inputstring) = ($1, ${^POSTMATCH});
        }
    } ## end else [ if ($inputstring =~ /^"/)]
}
[download]

This conforms to RFC 4180 :)

In reply to Re^3: split string by comma by Neighbour
in thread split string by comma by Anonymous Monk

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl: the Markov chain saw
	PerlMonks