http://qs321.pair.com?node_id=329121

in reply to Renovating Best Nodes

I like it, but I think that it would be nice to weight Selected Best Nodes more heavily towards the top few nodes.

As things stand, most times this is run, we don't have a single node in the top 20. More than a quarter of the time we won't get anything in the top 50.

Of course getting a broad selection is good as well.

The following snippet shows how you can balance the two fairly flexibly:

```# I'm assuming that sth returns a long list of nodes ordered
# from lowest rep to highest and then newest to oldest.
my @selected;
for (1..50) {
push @selected, \$sth->fetchrow_hashref();
}
while (my \$row = \$sth->fetchrow_hashref()) {
if (rand(1) < 0.1) {
\$selected[rand(@selected)] = \$row;
}
}
The resulting distribution has the following properties (back of the envelope calculation):
1. Most of the time we have some node in the top 10.
2. Our odds of not getting something in the top 50 are about .5%.
3. A nodes chance of getting in is better than the current scheme if it is in the top 692, or out of the top 2000.
4. About 60% of the time we get a node out of the top 2000 included.
I think that something like this would do a better job of showcasing the top nodes, while giving even more nodes a chance to be seen.

UPDATE: Here is a changed code sample that does the same as the above, only it reads from the highest reputation node to the lowest because I've been told that this is better. (A fact that complicates it, but oh well.)

```# I'm assuming that sth returns a long list of nodes ordered
# from highest rep to lowest and then oldest to newest.
my @selected;
my @filler;
my \$limit = 50;
while (my \$row = \$sth->fetchrow_hashref()) {
if (rand(1) < 0.1) {
\$selected[rand(\$limit)] ||= \$row;
}
elsif (@filler < \$limit) {
push @filler, \$row;
}
}
for (0..(\$#filler)) {
\$selected[\$_] ||= \$filler[\$_];
}