Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Text::CSV and very large Text Files

by webchalkboard (Scribe)
on Apr 19, 2005 at 10:36 UTC ( [id://449172]=perlquestion: print w/replies, xml ) Need Help??

webchalkboard has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Does anyone have any experience of working with Text::CSV and processing very large text files? My text file is about 570 megs big, and understandably Text::CSV is a little slow, will it be able to cope with this size file? Or is there a more efficient way of dealing with it?

Thanks,

Tom

Learning without thought is labor lost; thought without learning is perilous. - Confucius
WebChalkboard.com | For the love of art...

Replies are listed 'Best First'.
Re: Text::CSV and very large Text Files
by friedo (Prior) on Apr 19, 2005 at 10:51 UTC
    Text::CSV_XS is an interface to a CSV parser written in C which is very fast. It's definitely worth a try. Unfortunately, even in C parsing character-separated values is not a very fast operation. If you have to access the data a lot, consider importing it into a database with fixed-width fields and an index.
Re: Text::CSV and very large Text Files
by dragonchild (Archbishop) on Apr 19, 2005 at 12:47 UTC
    Use Text::xSV instead. It's pureperl, very fast, and handles edge cases better than Text::CSV or Text::CSV_XS. Oh - and it allows for any single-character separator. :-)
      > Use Text::xSV instead. update And I'd agree that if a pure perl solution is needed, it is the best and should be used instead of Text::CSV.
      Text::xSV is a very good moudle.
      > It's pureperl, very fast,
      But not as fast as Text::CSV_XS, see Benchmark comparison of Text::xSV and Text::CSV_XS. In many cases that speed difference wouldn't matter but the OP is specifically talking about large files where speed is an issue.
      > and handles edge cases better than Text::CSV
      Perhaps.
      > or Text::CSV_XS.
      Which edge cases does it handle better? See Comparison of the parsing features of CSV (and xSV) modules
      > Oh - and it allows for any single-character separator. :-)
      As do both Text::CSV and Text::CSV_XS.
Re: Text::CSV and very large Text Files
by blazar (Canon) on Apr 19, 2005 at 11:08 UTC
    I have used it on a 70Mb file reliably and with satisfaction. Indeed the processing was somewhat slow, hence I prepared a caching mechanism suitable for the application I was working on. Talking about Text::CSV_XS here.
Re: Text::CSV and very large Text Files
by webchalkboard (Scribe) on Apr 19, 2005 at 11:16 UTC

    Well i've got the script running and it seems to be doing ok. We have a pretty capable server so that probably helps. Speed isn't too much of an issue as it's just a background job, and there are plenty of other bottlenecks anyway ;)

    Thanks for the replies though, i'll definately consider CSV_XS next time. I've actually used it before, but I was running my scripts on various different servers and I got fed up with having to ask them to install Perl modules, so now try and use the standard ones where possible.

    Interesting foreign characters though, i've set it to binary to hopefully deal with that. That's been a problem for me before...

    Learning without thought is labor lost; thought without learning is perilous. - Confucius
    WebChalkboard.com | For the love of art...
Re: Text::CSV and very large Text Files
by rupesh (Hermit) on Apr 19, 2005 at 10:46 UTC
    Probably [id://Text::CSV|this] would help

    Cheers,
    Rupesh.

    Update: I also found this

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://449172]
Approved by polettix
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-04-19 20:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found