### Re^4: using Statistics::Regression

by Random_Walk (Prior)
 on Apr 19, 2017 at 19:52 UTC ( #1188310=note: print w/replies, xml ) Need Help??

in reply to Re^3: using Statistics::Regression

Thank you so much Anonymonk, now I am getting somewhere. The fragment of code I am now using, culled from a larger script goes like this ...

```    # OK, now lets use linear regression to fit
my \$reg = Statistics::Regression->new( \$data->{Name}, [ "Const", "
+Theta1", "Theta2" ] );

for ( @{\$data->{values}} ) {
# some time conversion goes on here to make times into Epo
+ch ...
my \$epoch = mktime(\$s, \$m, \$h, \$D, \$M-1, \$Y);
my \$x = \$_->[2];
print "\\$reg->include ( \$epoch, [1, \$x, ". \$x**2 ." ] )\n"
+;
\$reg->include ( \$epoch, [1, \$x, \$x**2 ] );
}
print "Results are ...\n";
\$reg->print();

Here is some output, now mostly it is working, but then on one set of data it chokes...

```# This is printed by the lines above, this one works fine...

\$reg->include ( 1491858157, [1, 95.24, 9070.6576 ] )
\$reg->include ( 1491944593, [1, 95.24, 9070.6576 ] )
\$reg->include ( 1492030986, [1, 95.22, 9066.8484 ] )
\$reg->include ( 1492117236, [1, 95.23, 9068.7529 ] )
\$reg->include ( 1492203637, [1, 95.23, 9068.7529 ] )
\$reg->include ( 1492290038, [1, 95.23, 9068.7529 ] )
\$reg->include ( 1492376435, [1, 95.23, 9068.7529 ] )
\$reg->include ( 1492462840, [1, 95.23, 9068.7529 ] )
\$reg->include ( 1492549241, [1, 95.23, 9068.7529 ] )
\$reg->include ( 1492621259, [1, 95.24, 9070.6576 ] )
Results are ...
****************************************************************
Regression '3116.dpepicqt.SYSAUX'
****************************************************************
Name                   Theta          StdErr     T-stat
[0='const']     -22405806112434.5120    16790271247707.7460       -1.3
+3
[1='Theta1']    470587750270.0963       352619498612.0823          1.3
+3
[2='Theta2']    -2470766737.1275        1851377334.2664   -1.33

R^2= 0.206, N= 10, K= 3
****************************************************************

# This one chokes ...

\$reg->include ( 1491858157, [1, 93.6, 8760.96 ] )
\$reg->include ( 1491944593, [1, 93.6, 8760.96 ] )
\$reg->include ( 1492030986, [1, 93.6, 8760.96 ] )
\$reg->include ( 1492117236, [1, 93.6, 8760.96 ] )
\$reg->include ( 1492203637, [1, 93.6, 8760.96 ] )
\$reg->include ( 1492290038, [1, 93.6, 8760.96 ] )
\$reg->include ( 1492376435, [1, 93.6, 8760.96 ] )
\$reg->include ( 1492462840, [1, 93.6, 8760.96 ] )
\$reg->include ( 1492549241, [1, 93.6, 8760.96 ] )
\$reg->include ( 1492621259, [1, 93.64, 8768.4496 ] )
Results are ...
****************************************************************
Regression '3116.dpepicqt.SYSTEM'
****************************************************************
Report.pl::Statistics::Regression:standarderrors: I cannot compute the
+ theta-covariance matrix for variable 3 0
at C:/Perl64/site/lib/Statistics/Regression.pm line 619.
Statistics::Regression::standarderrors(Statistics::Regression=
+HASH(0x44dfe90)) called at C:/Perl64/site/lib/Statistics/Regression.p
+m line 430
Statistics::Regression::print(Statistics::Regression=HASH(0x44
+dfe90)) called at Report.pl line 125
main::predict(HASH(0x4340ec8), 10) called at Report.pl line 85

I am guessing I may not have enough variation in that data for it to find an optimum, but if anyone can see I am barking up the wrong tree, please do shout

Cheers,
R.

Pereant, qui ante nos nostra dixerunt!

### Update

I have now tried it with a cubic term, and it failed on an earlier data set. Then I tried it with just the Constant and an X terms, no square or higher, and it ran the complete set. So now I can get a best fit line. Next step is to see if I can feed it some guess values for the theta vector.

Replies are listed 'Best First'.
Re^5: using Statistics::Regression
by Anonymous Monk on Apr 19, 2017 at 20:21 UTC
You need at least 3 distinct values of x to produce a quadratic fit. Similarly, you need at least 4 for a cubic, and at least 2 for a linear fit.

Yeh, I remembered that when I bombed out on a case with two samples so I have added a guard clause that I do not try to use LR if I have less than 10 samples. It appears to still have some very degenerate cases that kill this module with plenty of samples. I have now updated my code to divide Max(Y) by Min(Y) and use this 'change' value to alter the number of terms I use. Strangely it looks like cases where there is a sudden step change in the data that trigger this failure, I have one case where there is a 56% change and that kills the module.

```    # Lets play with order based on Max/Min value
my \$change = \$data->{Max}{AVG_Percentage_Used} / \$data->{Min}{AVG_
+Percentage_Used};
print "Change is \$change\n";
my \$order = 1;
\$order = 2 if \$change > 1.05;
\$order = 3 if \$change > 1.15;
\$order = 4 if \$change > 1.30;

my @Thetas = 'Const'; # Set Thetas for zero order
push @Thetas, 'Theta'.\$_ for 1 .. \$order;

my \$reg = Statistics::Regression->new( \$data->{Name}, \@Thetas );

for ( @{\$data->{values}} ) {
my \$epoch = mktime(\$s, \$m, \$h, \$D, \$M-1, \$Y);
my \$x = \$_->[2];
my @Data = 1;
push @Data, \$x**\$_ for 1..\$order;
print "\\$reg->include ( \$epoch, [".(join ", ", @Data)." ] )\n"
+;
\$reg->include ( \$epoch, \@Data );
}
print "Results are ...\n";
\$reg->print();

That appear to guard the cases where I had very little movement in Y over the series, but this one still kills it. The value 'Change' in this debug is Max/Min, so here there is a 30% change

```Change is 1.3
\$reg->include ( 1491859118, [1, 3.25, 10.5625 ] )
\$reg->include ( 1491902520, [1, 3.25, 10.5625 ] )
\$reg->include ( 1492032609, [1, 2.5, 6.25 ] )
\$reg->include ( 1492117432, [1, 2.5, 6.25 ] )
\$reg->include ( 1492204208, [1, 2.5, 6.25 ] )
\$reg->include ( 1492291088, [1, 2.5, 6.25 ] )
\$reg->include ( 1492377875, [1, 2.5, 6.25 ] )
\$reg->include ( 1492464416, [1, 2.5, 6.25 ] )
\$reg->include ( 1492551241, [1, 2.5, 6.25 ] )
\$reg->include ( 1492623578, [1, 2.5, 6.25 ] )
Results are ...
****************************************************************
Regression 'gotsvl2143.dpcmsr1t.TOOLS'
****************************************************************
Report.pl::Statistics::Regression:standarderrors: I cannot compute the
+ theta-covariance matrix for variable 3 0

Cheers,
R.

Pereant, qui ante nos nostra dixerunt!
It's not the amount of change that is the problem, it's the number of distinct values. In your last example, there are only 2 different values of x (2.5 and 3.25), so you can't get more than a linear fit. This is a fundamental limitation of the underlying mathematics.

