Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

How do we find statistical distribution of a given set of numbers?

by Sameet (Beadle)
on May 02, 2004 at 09:36 UTC ( [id://349786]=perlquestion: print w/replies, xml ) Need Help??

Sameet has asked for the wisdom of the Perl Monks concerning the following question:

Hi! Monks!
The question is important. If I am using a random number generator, there should be a measure to its randomness! Is there such a way to find randomness of a series of numbers. Or Is there is a way to find if the given set of numbers is following a particular distribution like Normal, Poisson, Weibul etc.

I am really looking forward for this answer
Regards
Sameet
  • Comment on How do we find statistical distribution of a given set of numbers?

Replies are listed 'Best First'.
Re: How do we find statistical distribution of a given set of numbers?
by matija (Priest) on May 02, 2004 at 11:42 UTC
    Yes, there are a whole lot of tests. The simplest test of randomness is to divide the range of the generator into buckets (let us say, 10, 100, or 1000 buckets), generate a lot of random numbers and plot how many generated numbers fall into each bucket. A good random generator will have approximately the same number of hits for each bucket (but so will a simple counter). Anyway, by the shape of the curve you can usualy make a pretty good guess about which distribution (if any) is being generated.

    A very common test of randomness is the Chi square test, which you can find at Statistics::ChiSquare.

    Knuth (the legend in computer science circles) wrote extensively on the subject of testing (and writing) pseudo-number generators in his seminal work "The Art of computer programming" (vol 2: seminumerical algorithms). You can also find a lot of algorithms here. However, they are in FORTRAN.

      Thanks, I am downloading the Statistics::ChiSquare from CPAN. Thank you for your help
      regards
      Sameet

        Be careful!

        Generating random numbers is hard. The chi-squared test is of very limited value. Numbers which fail, are likely not usefully random, but numbers which pass might be pretty useless as well.

        As with many things in life it depends what you want. If you're rolling dice in D&D, almost any RNG will do fine. If you are simulating a time series model with high dimensional data, only the very best RNG's will be any use at all.

        Testing randomness is not trivial. I wouldn't like to do it in Perl, though I'm sure it's possible. George Marsaglia has a useful page on his Diehard tester.

        Other resources include :-

        Good luck

        -- Anthony Staines
Re: How do we find statistical distribution of a given set of numbers?
by PERLscienceman (Curate) on May 02, 2004 at 11:25 UTC
    Greetings Sameet:
    When checking for "randomness" of data the first module that comes into my mind is Statistics::Chisquare, however, I if I were you I would conduct a generalized search on CPAN for Statistics modules. The search will yield a whole host of additional modules that I am certain you will find quite useful to your current task and perhaps future ones as well.
Re: How do we find statistical distribution of a given set of numbers?
by toma (Vicar) on May 03, 2004 at 01:41 UTC
    One test is to make a gray-scale bitmap out of your random numbers and see if there are patterns in it.

    I did this to create what I called The Snowcrash Test to show that it is not a good idea to re-seed a random number generator.

    It should work perfectly the first time! - toma
Re: How do we find statistical distribution of a given set of numbers?
by hawtin (Prior) on May 02, 2004 at 22:47 UTC

    The general answer to this is that you can't

    What you can do is to hypothosise that the numbers have, for example a uniform distribution, then feed the answers you got in and work out how probable it is that you would get this set of numbers by chance (a significance test). Generally if the probability of getting the numbers by chance is less than 1% (or 0.1% if your fussy and 5% if you're not) then you have proved that the numbers don't have that distribution.

    As has previously been noted it is very easy to get the maths wrong.

    But as for the distribution, it is like science, proving that the distribution of a set of numbers is not something is often easy, you just need one counter example, but you can never prove what it is, the counter-example might be the next one.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://349786]
Approved by pelagic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-04-16 22:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found