Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Re: DBD::CSV limitation versus aging/unmaintained modules (lazy)

by Eyck (Priest)
on Jan 16, 2004 at 10:53 UTC ( [id://321768]=note: print w/replies, xml ) Need Help??


in reply to Re: DBD::CSV limitation versus aging/unmaintained modules (lazy)
in thread DBD::CSV limitation versus aging/unmaintained modules

Hmm I thought that I already found where the problem is - record 3964, this is the one after which DBD::CSV stops noticing more records.

I just can find what exactly DBD::CSV finds wrong about that record/line, and more importantly - why doesn't it emit any kind of warning when it finds those 'corrupted' lines.

  • Comment on Re: Re: DBD::CSV limitation versus aging/unmaintained modules (lazy)

Replies are listed 'Best First'.
Re: Re: Re: DBD::CSV limitation versus aging/unmaintained modules (lazy)
by tilly (Archbishop) on Jan 16, 2004 at 15:27 UTC
    It is possible that Text::xSV can point it out for you. The biggest likelyhood is an unmatched " causing it to read the entire rest of the file as one really, really long line. (It keeps switching from quoted to nonquoted and back as it hits ", and always hits the end of line inside quotes and includes the return in a field.)

    If you post 3 lines from the file (before that line, on that line, and the next line) I should be able to visually spot it. But before you post, verify that DBD::CSV thinks that those 3 lines are only 2 rows.

Re: Re: Re: DBD::CSV limitation versus aging/unmaintained modules (lazy)
by Eyck (Priest) on Jan 16, 2004 at 16:11 UTC

    Ok, Thanks tilly, after reading your reply I finally started to see what is wrong about that line, it contains something like this:

    ,"Description description "hi world" rest of description",
    And overly-smart modules fail to parse that(not surprisingly).

    While it's easy to state that such file is badly formatted, it've been emitted from large Oracle-based system and there's nothing I can do about it ( not that I would pursue such noble cause now that I solved the problem on my side ).

      But you have not actually solved the problem from your side. You have just hidden it - guaranteed that if any fields anywhere have a comma in it, then you will silently give wrong results.

      I would suggest having your code at least put in some highly visible check for, for instance, an unexpected number of fields. And escalate the formatting issue a level or two. Because if their output doesn't correctly format CSV, then at some point there is nothing that you can do to work around the breakage.

        Ok, thanks.

        But even in case of comma in one of fields I end up with one broken line, not with whole datafile ignored.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://321768]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (2)
As of 2024-04-26 06:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found