Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Splitting a string on commas except when inside quotes

by anadem (Scribe)
on Dec 27, 2022 at 00:58 UTC ( [id://11149097]=perlquestion: print w/replies, xml ) Need Help??

anadem has asked for the wisdom of the Perl Monks concerning the following question:

My trusty code splits a string on commas:
( $fDate, $fType, $fDetails, $fRef, $fCurrency, $fAmount, $fPaidOut, $fFees, $fTax ) = split( /,/, $line );
Unfortunately the provider of the data has newly added a comma in the fDetails field and delimited the field by double-quotes:
11/21/2022,Payment,"Transfer to Smith, account 2",,USD,,123.60,,,

where previously there was neither a comma following Smith nor quotes around the details; split now erroneously gives:

$fDetails = "Transfer to Smith $fRef = Savings account"

with subsequent fields also getting the wrong data of course.

What's a clean way to handle this, such that the comma within the double-quotes is ignored by split? (I don't mind losing the comma within the details if that's easiest.) Kindly excuse my ignorance here, I retired some years ago and my brain has rusted.

Here's a runnable fragment:

$line = '11/21/2022,Payment,"Transfer to Smith, account 2",,USD,,123.60,'; print " $line\n"; ( $fDate, $fType, $fDetails, $fRef, $fCurrency, $fAmount, $fPaidOut, $fFees ) = split( /,/, $line ); printf("\n Date: %s; Type: %s; \n" ." Details: %s\n Reference: %s\n" ." Currency: %s; Amount: %s; PaidOut: %s; Fees: %s;\n", $fDate, $fType, $fDetails, $fRef, $fCurrency, $fAmount, $fPaidOut, $fFees );

Replies are listed 'Best First'.
Re: Splitting a string on commas except when inside quotes
by haukex (Archbishop) on Dec 27, 2022 at 07:12 UTC
Re: Splitting a string on commas except when inside quotes
by hippo (Bishop) on Dec 27, 2022 at 10:47 UTC

    Trivial solution using Text::CSV_XS (which is what I would reach for first):

    #!/usr/bin/env perl use strict; use warnings; use Text::CSV_XS; my $csv = Text::CSV_XS->new; my $line = '11/21/2022,Payment,"Transfer to Smith, account 2",,USD,,123.60,'; print " $line\n"; $csv->parse ($line); my ( $fDate, $fType, $fDetails, $fRef, $fCurrency, $fAmount, $fPaidOut, $fFees ) = $csv->fields; printf("\n Date: %s; Type: %s; \n" ." Details: %s\n Reference: %s\n" ." Currency: %s; Amount: %s; PaidOut: %s; Fees: %s;\n", $fDate, $fType, $fDetails, $fRef, $fCurrency, $fAmount, $fPaidOut, $fFees );

    Enjoy.


    🦛

      Though that is a perfect answer, the code you showed uses parse and thus is unsafe to use when upgrading from this easy single-line example to parsing a complete file record by record, wher getline whould be the preferred method to use.

      Below is an example that uses more recent Text::CSV_XS' csv function

      #!/usr/bin/perl use 5.014001; use warnings; use Text::CSV_XS qw( csv ); my $line = '11/21/2022,Payment,"Transfer to Smith, account 2",,USD,,123.60,'; say "IN: $line"; my $aoa = csv (in => \$line); my ($fDate, $fType, $fDetails, $fRef, $fCurrency, $fAmount, $fPaidOut, $fFees) = @{$aoa->[0]}; printf "\n Date: %s; Type: %s; \n". " Details: %s\n Reference: %s\n". " Currency: %s; Amount: %s; PaidOut: %s; Fees: %s;\n", $fDate, $fType, $fDetails, $fRef, $fCurrency, $fAmount, $fPaidOut, $fFees;

      to upgrade from this single line to the complete file

      my $aoa = csv (in => \$line);

      changes to

      my $aoa = csv (in => $file);

      And then all is available in the $aoa

      foreach my $record (@$aoa) { my ($date, $type, $dtls, $ref, $curr, $amt, $paid, $fees) = @$reco +rd; ...; }

      Enjoy, Have FUN! H.Merijn
Re: Splitting a string on commas except when inside quotes
by tybalt89 (Monsignor) on Dec 27, 2022 at 02:45 UTC

    Like this?

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11149097 use warnings; my $line = '11/21/2022,Payment,"Transfer to Smith, account 2",,USD,,123.60,'; print " $line\n"; my ( $fDate, $fType, $fDetails, $fRef, $fCurrency, $fAmount, $fPaidOut, $fFees ) = "$line," =~ /(".*?"|[^,]*),/g; # NOTE # = split( /,/, $line ); printf("\n Date: %s; Type: %s; \n" ." Details: %s\n Reference: %s\n" ." Currency: %s; Amount: %s; PaidOut: %s; Fees: %s;\n", $fDate, $fType, $fDetails, $fRef, $fCurrency, $fAmount, $fPaidOut, $fFees );

    Outputs:

    11/21/2022,Payment,"Transfer to Smith, account 2",,USD,,123.60, Date: 11/21/2022; Type: Payment; Details: "Transfer to Smith, account 2" Reference: Currency: USD; Amount: ; PaidOut: 123.60; Fees: ;
      my $line = '11/21/2022,Payment,"Transfer to Smith, account 2",,USD,,123.60,'; print " $line\n"; my ( $fDate, $fType, $fDetails, $fRef, $fCurrency, $fAmount, $fPaidOut, $fFees ) = "$line," =~ /(".*?"|[^,]*),/g; # NOTE
      What is that comma and double quotes doing to $line?

        The regex looks for a trailing comma for every field. "$line," just adds one for the last field.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11149097]
Approved by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-04-16 13:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found