sheer weight of data
That, right there, screams, "database." This volume of data ("overwhelming") is exactly what you use a database for. Don't worry about the speed - you can tweek the parameters to speed things up (a sweet spot for RAM usage where the db keeps stuff in memory, for example). Or you can move to a faster database. Or bigger hardware. But what you're not going to do is beat the speed by writing your own code in perl. Not because perl isn't fast, but because it'll take you forever to do (and then there's bugfixing).
If you don't want the external dependency of MySQL, then try DBD::SQLite, though I suspect MySQL to be faster.
By having a database system, complete with proper indexing, you can shunt most of the heavy lifting off to the C/C++ code instead, including its native handling of strings, etc., with far less overhead than Perl. It'll do precisely what the C++ code you refer to does: have an index which it searches, and then uses the offsets there to find the data in the data file(s). And it'll use mmap, if that's what is appropriate. And you don't have to write any code - just call the API with the query you want. This will allow you to focus on the real problem you're trying to tackle rather than computer-science details about how to support the problem.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|