Ok. Here's a crude demo in case someone is interested in doing benchmarking comparisons or whatever. The program is pure C, tested on linux 64-bit. Give two arguments: Nkeys and Filename. File should contain sym-value pairs, both unique in their domains. (Space-separated, one pair per line).
Update. Some additional notes regarding the program.
Portability is easily improved upon:
- Replace mmap/mremap with malloc/realloc. This is trivial.
- The test for direct vs indirect entries (ss offset) might be written as !(s->o >> 56), avoiding issues with endianness.
There are at least two bugs to fix:
- Verify sym to val loop has no test for NULL; this will fault on a missing key.
- The fetch_by_key should not proceed with a memcmp in case keylen > 8, but the slot is direct (ie <= 8).
Many further optimizations to consider:
- Try to allocate ss strings so that cache-line boundaries aren't crossed.
- Terminate strings with a high-bit: more compact, plus faster than using memchr.
- Specific inlined implementations for 8-byte memcmp, etc.
- Inplace reorder looks up every key twice. Place an (un)processed flag to halve this.
- Choose a faster hash function for string keys, FNV perhaps.
- A primitive hash function for values. Just a mul by constant?
There's also a wishlist for the CMPH library, but enough for now.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|