Re: Netflix (or on handling large amounts of data efficiently in perl)

sheer weight of data

That, right there, screams, "database." This volume of data ("overwhelming") is exactly what you use a database for. Don't worry about the speed - you can tweek the parameters to speed things up (a sweet spot for RAM usage where the db keeps stuff in memory, for example). Or you can move to a faster database. Or bigger hardware. But what you're not going to do is beat the speed by writing your own code in perl. Not because perl isn't fast, but because it'll take you forever to do (and then there's bugfixing).

If you don't want the external dependency of MySQL, then try DBD::SQLite, though I suspect MySQL to be faster.

By having a database system, complete with proper indexing, you can shunt most of the heavy lifting off to the C/C++ code instead, including its native handling of strings, etc., with far less overhead than Perl. It'll do precisely what the C++ code you refer to does: have an index which it searches, and then uses the offsets there to find the data in the data file(s). And it'll use mmap, if that's what is appropriate. And you don't have to write any code - just call the API with the query you want. This will allow you to focus on the real problem you're trying to tackle rather than computer-science details about how to support the problem.

Comment on Re: Netflix (or on handling large amounts of data efficiently in perl)


Perl: the Markov chain saw
	PerlMonks