Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Distribution of Levels and Writeups

by graff (Chancellor)
on Oct 01, 2004 at 05:46 UTC ( [id://395558]=note: print w/replies, xml ) Need Help??


in reply to Distribution of Levels and Writeups

There is a basic flaw in your tally of write-ups per group, and you probably don't have sufficient data to fix it.

You have taken the total number of posts by someone who is now a "saint", and added that to the sum of "posts created by saints". But you don't know how many of this person's posts were submitted before he or she became a saint.

To summarize the proportion of nodes from non-saints "in general", you'd need to do extra work on each person who is not an initiate, to determine how many nodes they wrote at each of the levels they passed through, and distribute those numbers properly among the various levels.

Of course, another factor in the "imbalance" is the "graduated" scaling of the XP thresholds. The trip from Initiate to Monk involves steps of 20, 50, 100 and 200 XP. If an average non-clueless node yields about 5 XP, Initiates don't get to post more than 4 nodes or so before they cease to be Initiates; and with another just 10 nodes or so (not to mention XP derived from voting), they cease to be Novices. This tends to limit the total node contribution from these groups; the stats page shows about 25K initiates with about 33K posts among them, which is probably close to the limit of how many nodes can be owned by that many initiates at any one time.

And frankly, I think the coded pyramid layout, while eye-catching and portentous, gives a misleading sense of proportion when two groups of roughly equal size dominate the distribution. In the write-ups picture, it seems like the A's have a vast dominance over the lowly B's. It takes some time to count all those letters and realize its a difference of 42% vs. 37%, which isn't nearly as big a difference as it appears to be in the diagram.

Suppose you had just two groups of 50% each arranged in this sort of pyramid. Whichever group you put on top would occupy 7 full rows plus one cell in the eigth row, while the other group would occupy just the three bottom rows (minus the one cell taken by the top group). Show that picture to any casual observer and ask "Do you think there are more letters in one group than the other? If so, which group has more letters?"

With display techniques like this, it's no wonder that the phrase "Lies, Damn Lies, and Statistics" is so well known.

Replies are listed 'Best First'.
Re^2: Distribution of Levels and Writeups
by jZed (Prior) on Oct 01, 2004 at 15:13 UTC
    Thanks for your comments here and the many other posts you've made helping the rest of us start to understand statistics - I very much value your contributions.

    If I had claimed some particular conclusion that could be drawn from these pictures, you'd be right that that conclusion wouldn't be very valid. And you're right that the pyramids (like any form of expression) foreground some things and background others. An even worse flaw is that the levels *are based on* the writeups so that obviously there will be more at the top - how could acolytes ever accumulate a large number of writeups? As soon as the poor Sisyphusian accolytes made a large number of posts (assuming non-zero XP on those posts) they would no longer be accolytes.

    And agreed, the pyramids are not at all good for fine-level distinctions. What they show, and show very clearly is the general relationship of readers of PM to writers of PM: that about 4% of the registered users contribute close to 80% of the writeups (regardless of what level they were when they made the posts). As for the comparison of saints to the four groups below them - what my eye tells me is that they have roughly the same number of writeups but that there are three times as many in the friars-through-pontiffs group as there are saints.

    Does my post represent anything more than eye-candy? Probably not, but hopefully a few people enjoyed it.

Re^2: Distribution of Levels and Writeups (bias)
by tye (Sage) on Oct 01, 2004 at 08:14 UTC

    I don't see where you are reading that the statistics are supposed to be counted based on the level of the monk at the time of node creation. I think you just assumed this interpretation and then complained about the data not matching your assumption.

    I noticed the bias in the triangular display. Thanks for commenting on it.

    - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://395558]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-03-28 18:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found