http://qs321.pair.com?node_id=395558


in reply to Distribution of Levels and Writeups

There is a basic flaw in your tally of write-ups per group, and you probably don't have sufficient data to fix it.

You have taken the total number of posts by someone who is now a "saint", and added that to the sum of "posts created by saints". But you don't know how many of this person's posts were submitted before he or she became a saint.

To summarize the proportion of nodes from non-saints "in general", you'd need to do extra work on each person who is not an initiate, to determine how many nodes they wrote at each of the levels they passed through, and distribute those numbers properly among the various levels.

Of course, another factor in the "imbalance" is the "graduated" scaling of the XP thresholds. The trip from Initiate to Monk involves steps of 20, 50, 100 and 200 XP. If an average non-clueless node yields about 5 XP, Initiates don't get to post more than 4 nodes or so before they cease to be Initiates; and with another just 10 nodes or so (not to mention XP derived from voting), they cease to be Novices. This tends to limit the total node contribution from these groups; the stats page shows about 25K initiates with about 33K posts among them, which is probably close to the limit of how many nodes can be owned by that many initiates at any one time.

And frankly, I think the coded pyramid layout, while eye-catching and portentous, gives a misleading sense of proportion when two groups of roughly equal size dominate the distribution. In the write-ups picture, it seems like the A's have a vast dominance over the lowly B's. It takes some time to count all those letters and realize its a difference of 42% vs. 37%, which isn't nearly as big a difference as it appears to be in the diagram.

Suppose you had just two groups of 50% each arranged in this sort of pyramid. Whichever group you put on top would occupy 7 full rows plus one cell in the eigth row, while the other group would occupy just the three bottom rows (minus the one cell taken by the top group). Show that picture to any casual observer and ask "Do you think there are more letters in one group than the other? If so, which group has more letters?"

With display techniques like this, it's no wonder that the phrase "Lies, Damn Lies, and Statistics" is so well known.