Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Netflix (or on handling large amounts of data efficiently in perl)

by Tanktalus (Canon)
on Dec 24, 2008 at 14:43 UTC ( [id://732468]=note: print w/replies, xml ) Need Help??


in reply to Netflix (or on handling large amounts of data efficiently in perl)

sheer weight of data

That, right there, screams, "database." This volume of data ("overwhelming") is exactly what you use a database for. Don't worry about the speed - you can tweek the parameters to speed things up (a sweet spot for RAM usage where the db keeps stuff in memory, for example). Or you can move to a faster database. Or bigger hardware. But what you're not going to do is beat the speed by writing your own code in perl. Not because perl isn't fast, but because it'll take you forever to do (and then there's bugfixing).

If you don't want the external dependency of MySQL, then try DBD::SQLite, though I suspect MySQL to be faster.

By having a database system, complete with proper indexing, you can shunt most of the heavy lifting off to the C/C++ code instead, including its native handling of strings, etc., with far less overhead than Perl. It'll do precisely what the C++ code you refer to does: have an index which it searches, and then uses the offsets there to find the data in the data file(s). And it'll use mmap, if that's what is appropriate. And you don't have to write any code - just call the API with the query you want. This will allow you to focus on the real problem you're trying to tackle rather than computer-science details about how to support the problem.

  • Comment on Re: Netflix (or on handling large amounts of data efficiently in perl)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://732468]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2024-04-25 15:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found