Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
I like it, but I think that it would be nice to weight Selected Best Nodes more heavily towards the top few nodes.

As things stand, most times this is run, we don't have a single node in the top 20. More than a quarter of the time we won't get anything in the top 50.

Of course getting a broad selection is good as well.

The following snippet shows how you can balance the two fairly flexibly:

# I'm assuming that sth returns a long list of nodes ordered # from lowest rep to highest and then newest to oldest. my @selected; for (1..50) { push @selected, $sth->fetchrow_hashref(); } while (my $row = $sth->fetchrow_hashref()) { if (rand(1) < 0.1) { $selected[rand(@selected)] = $row; } }
The resulting distribution has the following properties (back of the envelope calculation):
  1. Most of the time we have some node in the top 10.
  2. Our odds of not getting something in the top 50 are about .5%.
  3. A nodes chance of getting in is better than the current scheme if it is in the top 692, or out of the top 2000.
  4. About 60% of the time we get a node out of the top 2000 included.
I think that something like this would do a better job of showcasing the top nodes, while giving even more nodes a chance to be seen.

UPDATE: Here is a changed code sample that does the same as the above, only it reads from the highest reputation node to the lowest because I've been told that this is better. (A fact that complicates it, but oh well.)

# I'm assuming that sth returns a long list of nodes ordered # from highest rep to lowest and then oldest to newest. my @selected; my @filler; my $limit = 50; while (my $row = $sth->fetchrow_hashref()) { if (rand(1) < 0.1) { $selected[rand($limit)] ||= $row; } elsif (@filler < $limit) { push @filler, $row; } } for (0..($#filler)) { $selected[$_] ||= $filler[$_]; }
This does the same thing as the snippet above except that I am filling in "nothing got chosen by chance" with top nodes rather than bottom nodes. If you fetch 4000 nodes, then you will only fill in from the filler 1.68% of the time. Alternately you can change the 0.1 to 0.2, and leave the number of nodes that you fetch at 2000. Or you can fetch 2000, leave the parameter at 0.1, and say that people don't mind seeing an extra one of the top 55 nodes or so get spotlighted 60% of the time.

Many other ways to tweak this exist.

In reply to Re: Renovating Best Nodes by tilly
in thread Renovating Best Nodes by demerphq

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (2)
As of 2021-10-24 01:32 GMT
Find Nodes?
    Voting Booth?
    My first memorable Perl project was:

    Results (88 votes). Check out past polls.