Is there an even mix of lookup by number and by name?
The algorithm is a GA exploration, thus effectively random and approximately evenly distributed across the full ranges of both domains. At least in the early rounds.
As the selection progresses towards a minima, the visited domains shrink, but the lookups in both domains remain roughly equal.
In addition, after each iteration, many new pairs are selected (from the lookup tables), with an equivalent number old one discarded (from the actively considered subset, not the lookup tables), thus potentially opening up the full ranges again.
Is there any bias to the names/values looked up? In other words, are certain values more likely to be looked up than others? If so, is there a way of knowing which ones are more likely before your lookup pass?
As you'll gather from the above, no on all counts.
Similarly: are recently-looked-up values more likely to be looked up again soon, or is it a totally random distribution?
Yes, for brief periods of the total run, but then a different set will be active, then another. And there is no way to predict which subsets will be required at any given time.
There is no mileage in only having a subset available at any given time.
As I understand your question, the data is loaded first, then used frequently afterwards. Is my understanding correct?
Actually generated rather than "loaded"; but yes.
The dataset is generated as a random subset of a truly huge domain of possibilities.
The larger the subset that can be generated -- limited by the size of the lookup table(s) and physical memory -- the more statistically valid the results.
I'm currently limited to runs of 35e6 -- ~6GB of ram -- which represents ~0.000000003% of the total possibilities.
I'm seeking a way to reduce the memory footprint without too much loss of performance, in order that I might increase the statistical validity of the simulation.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|