in reply to Renovating Best Nodes

I like it, but I think that it would be nice to weight Selected Best Nodes more heavily towards the top few nodes.

As things stand, most times this is run, we don't have a single node in the top 20. More than a quarter of the time we won't get anything in the top 50.

Of course getting a broad selection is good as well.

The following snippet shows how you can balance the two fairly flexibly:

# I'm assuming that sth returns a long list of nodes ordered # from lowest rep to highest and then newest to oldest. my @selected; for (1..50) { push @selected, $sth->fetchrow_hashref(); } while (my $row = $sth->fetchrow_hashref()) { if (rand(1) < 0.1) { $selected[rand(@selected)] = $row; } }
The resulting distribution has the following properties (back of the envelope calculation):
  1. Most of the time we have some node in the top 10.
  2. Our odds of not getting something in the top 50 are about .5%.
  3. A nodes chance of getting in is better than the current scheme if it is in the top 692, or out of the top 2000.
  4. About 60% of the time we get a node out of the top 2000 included.
I think that something like this would do a better job of showcasing the top nodes, while giving even more nodes a chance to be seen.

UPDATE: Here is a changed code sample that does the same as the above, only it reads from the highest reputation node to the lowest because I've been told that this is better. (A fact that complicates it, but oh well.)

# I'm assuming that sth returns a long list of nodes ordered # from highest rep to lowest and then oldest to newest. my @selected; my @filler; my $limit = 50; while (my $row = $sth->fetchrow_hashref()) { if (rand(1) < 0.1) { $selected[rand($limit)] ||= $row; } elsif (@filler < $limit) { push @filler, $row; } } for (0..($#filler)) { $selected[$_] ||= $filler[$_]; }
This does the same thing as the snippet above except that I am filling in "nothing got chosen by chance" with top nodes rather than bottom nodes. If you fetch 4000 nodes, then you will only fill in from the filler 1.68% of the time. Alternately you can change the 0.1 to 0.2, and leave the number of nodes that you fetch at 2000. Or you can fetch 2000, leave the parameter at 0.1, and say that people don't mind seeing an extra one of the top 55 nodes or so get spotlighted 60% of the time.

Many other ways to tweak this exist.