Text::CSV and very large Text Files

webchalkboard has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Does anyone have any experience of working with Text::CSV and processing very large text files? My text file is about 570 megs big, and understandably Text::CSV is a little slow, will it be able to cope with this size file? Or is there a more efficient way of dealing with it?

Thanks,

Tom

Learning without thought is labor lost; thought without learning is perilous. - Confucius

WebChalkboard.com | For the love of art...

Comment on Text::CSV and very large Text Files

Replies are listed 'Best First'.
Re: Text::CSV and very large Text Files by friedo (Prior) on Apr 19, 2005 at 10:51 UTC
Text::CSV_XS is an interface to a CSV parser written in C which is very fast. It's definitely worth a try. Unfortunately, even in C parsing character-separated values is not a very fast operation. If you have to access the data a lot, consider importing it into a database with fixed-width fields and an index.	[reply]
Re: Text::CSV and very large Text Files by dragonchild (Archbishop) on Apr 19, 2005 at 12:47 UTC
Use Text::xSV instead. It's pureperl, very fast, and handles edge cases better than Text::CSV or Text::CSV_XS. Oh - and it allows for any single-character separator. :-) My wife's blog	[reply]
Re^2: Text::CSV and very large Text Files by jZed (Prior) on Apr 22, 2005 at 04:08 UTC
> Use Text::xSV instead. update And I'd agree that if a pure perl solution is needed, it is the best and should be used instead of Text::CSV. Text::xSV is a very good moudle. > It's pureperl, very fast, But not as fast as Text::CSV_XS, see Benchmark comparison of Text::xSV and Text::CSV_XS. In many cases that speed difference wouldn't matter but the OP is specifically talking about large files where speed is an issue. > and handles edge cases better than Text::CSV Perhaps. > or Text::CSV_XS. Which edge cases does it handle better? See Comparison of the parsing features of CSV (and xSV) modules > Oh - and it allows for any single-character separator. :-) As do both Text::CSV and Text::CSV_XS.	[reply]
Re: Text::CSV and very large Text Files by blazar (Canon) on Apr 19, 2005 at 11:08 UTC
I have used it on a 70Mb file reliably and with satisfaction. Indeed the processing was somewhat slow, hence I prepared a caching mechanism suitable for the application I was working on. Talking about Text::CSV_XS here.	[reply]
Re: Text::CSV and very large Text Files by webchalkboard (Scribe) on Apr 19, 2005 at 11:16 UTC
Well i've got the script running and it seems to be doing ok. We have a pretty capable server so that probably helps. Speed isn't too much of an issue as it's just a background job, and there are plenty of other bottlenecks anyway ;) Thanks for the replies though, i'll definately consider CSV_XS next time. I've actually used it before, but I was running my scripts on various different servers and I got fed up with having to ask them to install Perl modules, so now try and use the standard ones where possible. Interesting foreign characters though, i've set it to binary to hopefully deal with that. That's been a problem for me before... Learning without thought is labor lost; thought without learning is perilous. - Confucius WebChalkboard.com \| For the love of art...	[reply]
Re: Text::CSV and very large Text Files by rupesh (Hermit) on Apr 19, 2005 at 10:46 UTC
Probably [id://Text::CSV\|this] would help Cheers, Rupesh. Update: I also found this	[reply]


more useful options
	PerlMonks