Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
sheer weight of data

That, right there, screams, "database." This volume of data ("overwhelming") is exactly what you use a database for. Don't worry about the speed - you can tweek the parameters to speed things up (a sweet spot for RAM usage where the db keeps stuff in memory, for example). Or you can move to a faster database. Or bigger hardware. But what you're not going to do is beat the speed by writing your own code in perl. Not because perl isn't fast, but because it'll take you forever to do (and then there's bugfixing).

If you don't want the external dependency of MySQL, then try DBD::SQLite, though I suspect MySQL to be faster.

By having a database system, complete with proper indexing, you can shunt most of the heavy lifting off to the C/C++ code instead, including its native handling of strings, etc., with far less overhead than Perl. It'll do precisely what the C++ code you refer to does: have an index which it searches, and then uses the offsets there to find the data in the data file(s). And it'll use mmap, if that's what is appropriate. And you don't have to write any code - just call the API with the query you want. This will allow you to focus on the real problem you're trying to tackle rather than computer-science details about how to support the problem.


In reply to Re: Netflix (or on handling large amounts of data efficiently in perl) by Tanktalus
in thread Netflix (or on handling large amounts of data efficiently in perl) by Garp

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-04-26 08:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found