Excel or Perl for simple statistics?


Syntactic Confectionery Delight
	PerlMonks

Excel or Perl for simple statistics?

by Anonymous Monk

on Jan 20, 2018 at 13:22 UTC ( [id://1207579]=perlquestion: print w/replies, xml )

Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I want to do a few basic statistics (mean and stdev) and since I have lots of data I figured I would do it with Perl (using the Statistics::Basic module). However, I notice some quite interesting discrepancies as opposed to if I calculated the same values using Excel's formulas. Particularly I think the stdev is a problem, since in one test run I did, I got 0.401 with Excel and 0.41 with the Perl module, which does affect my calculations later on.
Do you have any comment on that?

Comment on Excel or Perl for simple statistics? Download Code

Replies are listed 'Best First'.

Re: Excel or Perl for simple statistics?
by tangent (Parson) on Jan 20, 2018 at 17:53 UTC

As pryrt points out, you can get Statistics::Basic to do both by setting UNBIAS to true, whereas Statistics::Lite gives you two separate functions. Note also that Statistics::Basic defaults to a precision of 2 decimal points - this can be changed by setting IPRES to the desired precision.

Just for fun I did a little comparison table:

Excel	STDEV	0.41068873	STDEVP	0.400791162
Open Office	STDEV	0.4106887305	STDEVP	0.400791162
Statistics::Lite	stddev()	0.410688730467897	stddevp()	0.400791162022762
Statistics::Basic	stddev() UNBIAS true	0.410688730467897	stddev()	0.400791162022762

Re: Excel or Perl for simple statistics?
by davies (Prior) on Jan 20, 2018 at 14:28 UTC

There's a difference, IIRC, between the population standard deviation and the sample standard deviation (again IIRC, this is sometimes called standard error - I'm sure a statistician will explain the issues). Excel has functions for both - stdev and stdeva. Are you sure which you want to use and that you are using the equivalent in both?

Regards,

John Davies

Update: I didn't mention stdevp as I didn't know about it. Thanks, poj.

Re^2: Excel or Perl for simple statistics?

by Anonymous Monk on Jan 20, 2018 at 16:10 UTC

Re^3: Excel or Perl for simple statistics?

by pryrt (Abbot) on Jan 20, 2018 at 17:36 UTC

I believe davies was wrong about it also being called the standard error: according to wp, Standard_error, the standard error is the standard deviation divided by the root of the number of samples.

The population standard deviation (Excel: =STDEVP() or =STDEV.P()) assumes you that have data for the entire population; the sample standard deviation (Excel: =STDEV() or =STDEV.S()) assumes that your n data points are just a sample of the population, so it uses a different denominator (n-1) inside the radical, to account for the uncertainty because it's just a sample, not the whole population. Given your set of data, the 0.401791 is the population standard deviation, and the 0.410689 is the sample standard deviation.

Population standard deviation: σ = sqrt( Σ{ (x_i - x_avg)² } / { n } )
Sample standard deviation: s = sqrt( Σ{ (x_i - x_avg)² } / { n - 1 } )

According to Statistics::Basic, you can set $Statistics::Basic::UNBIAS true to get the sample, or (implied) set it false to get the population. Statistics::Lite directly says that their stddev() function is the unbiased (sample) n-1, which matches the Excel =STDEV() function, and they have the stddevp() for the population n, matching the Excel =STDEVP().

[reply]
[d/l]
[select]

Re^3: Excel or Perl for simple statistics?

by Anonymous Monk on Jan 20, 2018 at 16:38 UTC

Statistics::Lite

Statistics::Lite

Statistics::Basic

0.964
0
0
0.94
0.895
0.915
0.775
0.868
0
0.796
0.866
0.819
0
0
0
0
0.794
0.806
0.781
0.807
0.783
[download]

[reply]
[d/l]
[select]

Re^4: Excel or Perl for simple statistics?

by poj (Abbot) on Jan 20, 2018 at 17:06 UTC

Re: Excel or Perl for simple statistics?
by karlgoethebier (Abbot) on Jan 21, 2018 at 10:57 UTC

"...mean and stdev..."

Being a statistics noob i had some success with Statistics::Descriptive.

Best regards, Karl

P.S.: For yet another Excel abuse see Alte Bücher ziehen in neue Bibliothek um. The article is in German, unfortunately.

ŤThe Crux of the Biscuit is the Apostropheť

perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Re^2: Excel or Perl for simple statistics?

by afoken (Chancellor) on Jan 22, 2018 at 22:09 UTC

For yet another Excel abuse see Alte Bücher ziehen in neue Bibliothek um. The article is in German, unfortunately.

Quoting the article:

Für die Mitarbeiter der Spezialfirma bedeutet der Umzug einerseits schwere körperliche Arbeit. Aber sie müssen auch geistig fit sein und genau auf die Signaturen der Bücher achten, damit es am neuen Bibliotheksstandort kein Durcheinander gibt. Wo dort künftig welches Buch stehen wird, das wurde vorab festgelegt in einem virtuellen Regalbuch mithilfe einer Excel-Tabelle mit 75 000 Zeilen, erläutert Sören Kristensen.

So, essentially, someone has created a 75.000 line Excel sheet mapping from book signature to new rack and new position in the rack to help moving a library.

I would not call that abuse of Excel. I've seen much worse things done in Excel, pushing Excel (9x), Windows (9x), and the machine (Pentium class PC with minimal RAM) to their limits.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Back to Seekers of Perl Wisdom

Log In^?

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: perlquestion [id://1207579]
Approved by davies
Front-paged by Discipulus
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others musing on the Monastery: (4)

As of 2024-04-25 16:37 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found