Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^2: Comparison of the parsing features of CSV (and xSV) modules

by dragonchild (Archbishop)
on Jun 15, 2004 at 14:17 UTC ( [id://366899]=note: print w/replies, xml ) Need Help??


in reply to Re: Comparison of the parsing features of CSV (and xSV) modules
in thread Comparison of the parsing features of CSV (and xSV) modules

What would be some example data, how it's currently being parsed, and how you'd like it to be parsed?

------
We are the carpenters and bricklayers of the Information Age.

Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

I shouldn't have to say this, but any code, unless otherwise stated, is untested

  • Comment on Re^2: Comparison of the parsing features of CSV (and xSV) modules

Replies are listed 'Best First'.
Re^3: Comparison of the parsing features of CSV (and xSV) modules
by Wally Hartshorn (Hermit) on Jun 15, 2004 at 21:14 UTC

    Here's an example:

    "Smith","John",12/31/1962,"Author of "How to Break Programs" and other books","Bugger"

    I'm using a series of (somewhat fragile) regexes to change that to:

    "Smith","John",12/31/1962,"Author of ""How to Break Programs"" and other books","Bugger"

    Wally Hartshorn

      There are, of course, going to be boundary cases that don't work as expected as soon as you start playing with allowing undoubled double-quotes inside of a format that expects them doubled. However Text::xSV allows you to define arbitrary filters that it preprocesses text with, and should do a reasonable job on the above with the following filter:
      sub { my $line = shift; $line =~ s/\r$//; $line =~ s/"(.)/""$1/g; $line =~ s/"?,"?/,/g; return $line; }
      Yes, there is some fragility, but it should be at least moderately hard to trigger.
      And, what should the parser do with the following:
      "Smith","John",12/31/1962,"Author of "How to Break Programs" and other + books,"Bugger" "Smith","John",12/31/1962,Author of "How to Break Programs" and other +books,"Bugger" "Smith","John",12/31/1962,'Author of "How to Break Programs" and other + books,"Bugger" "Smith","John",12/31/1962,'Author of "How to Break Programs" and other + books',"Bugger"

      ------
      We are the carpenters and bricklayers of the Information Age.

      Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      I shouldn't have to say this, but any code, unless otherwise stated, is untested

        And, what should the parser do with the following:
        "Smith","John",12/31/1962,"Author of "How to Break Programs" and other + books,"Bugger" "Smith","John",12/31/1962,"Author of ""How to Break Programs"" and oth +er books,"Bugger"
        "Smith","John",12/31/1962,Author of "How to Break Programs" and other +books,"Bugger" "Smith","John",12/31/1962,Author of ""How to Break Programs"" and othe +r books,"Bugger"
        "Smith","John",12/31/1962,'Author of "How to Break Programs" and other + books,"Bugger" (Reject?)
        "Smith","John",12/31/1962,'Author of "How to Break Programs" and other + books',"Bugger" (Reject?)

        (I haven't encountered any improperly quoted data, just data that doesn't escape embedded delimiters.)

        Wally Hartshorn

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://366899]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (None)
    As of 2024-04-19 00:12 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found