Recently, I came up against a real life situation involving complex curves and the area under them. I dutifully combed the internet and dug out my old probability and statistics text book to figure out the answer to the question, "How do I find the union of two normal distributions?" (Normal distribution - the famous "bell curve").
Empirically comparing normal distributions: A normal distribution is a graph of probabilities, really just an array. Perl is great for handling this stuff, so I wrote a method that converts a mean and standard deviation into an array plotting out what a population of 100 would look like. I did this again for the second mean and SD and then compared the results. If a point appeared in both arrays, it was counted. Then, a simple look at the relationship between the number of 'common' points to the population and presto! A simple percentage of similarity!
As an added bonus, I could increase the population to arbitrarily high numbers (1000 or 10000 points) to get better resolution on the graphs and hence, more accurate percentages. I would make my script plot out the 100's of graphs and weigh them for me.
A quick search found me a GPL'd method for finding the percentage of a population at a datapoint based on the mean and SD:
Where '$x' is a point on the x axis. Next, I wrote a simple routine for turning my mean and standard deviation into an array:sub int_gen_curve { my ($self, $x, $mean, $sdev ) = @_; my $pi = 3.14159265358979323844; return 1 / ( sqrt( 2 * $pi * $sdev * $sdev )) * exp( -($x-$mean)*($x-$mean) / ( 2 * $sdev * $sdev )) +; }
In the end, I just had to count the positive points in $data after the subtraction and find out what the percentage relationship was to the population. If there would be NO positive points ($data1 removed all of the points from $data) I would have two equal distributions.sub compare_bell_curves my ($self,$m1,$sd1,$m2,$sd2) = @_; my $upperbound = sprintf("%.1f", ($m1 + (1.75 * $sd1))); my $lowerbound = sprintf("%.1f", ($m1 - (1.75 * $sd1))); my $area = 10000; my $data; for (my $x=$lowerbound; $x < $upperbound; $x = $x + 1) { $x = sprintf("%.0f", $x); my $posarg = $self->int_gen_curve($x,$m1,$sd1); $data->[$x] = int($area*$posarg); } # do the above again for the $m2 and $sd2 # then subtract $data1 from $data, any $data->[x] # that remains positive is a point of difference
Perl allowed me to reduce a quite complex problem into it's basic elements and solve it using real numbers, empirically. As an added bonus, I had datasets that could easily be plugged into GD::Graph, to give that extra, visual, representation of what the data said.
Does anybody else think like this, or am I just kooky?