http://qs321.pair.com?node_id=1188320


in reply to Re^5: using Statistics::Regression
in thread using Statistics::Regression

Yeh, I remembered that when I bombed out on a case with two samples so I have added a guard clause that I do not try to use LR if I have less than 10 samples. It appears to still have some very degenerate cases that kill this module with plenty of samples. I have now updated my code to divide Max(Y) by Min(Y) and use this 'change' value to alter the number of terms I use. Strangely it looks like cases where there is a sudden step change in the data that trigger this failure, I have one case where there is a 56% change and that kills the module.

# Lets play with order based on Max/Min value my $change = $data->{Max}{AVG_Percentage_Used} / $data->{Min}{AVG_ +Percentage_Used}; print "Change is $change\n"; my $order = 1; $order = 2 if $change > 1.05; $order = 3 if $change > 1.15; $order = 4 if $change > 1.30; my @Thetas = 'Const'; # Set Thetas for zero order push @Thetas, 'Theta'.$_ for 1 .. $order; my $reg = Statistics::Regression->new( $data->{Name}, \@Thetas ); # Add data points for ( @{$data->{values}} ) { my $epoch = mktime($s, $m, $h, $D, $M-1, $Y); my $x = $_->[2]; my @Data = 1; push @Data, $x**$_ for 1..$order; print "\$reg->include ( $epoch, [".(join ", ", @Data)." ] )\n" +; $reg->include ( $epoch, \@Data ); } print "Results are ...\n"; $reg->print();

That appear to guard the cases where I had very little movement in Y over the series, but this one still kills it. The value 'Change' in this debug is Max/Min, so here there is a 30% change

Change is 1.3 $reg->include ( 1491859118, [1, 3.25, 10.5625 ] ) $reg->include ( 1491902520, [1, 3.25, 10.5625 ] ) $reg->include ( 1492032609, [1, 2.5, 6.25 ] ) $reg->include ( 1492117432, [1, 2.5, 6.25 ] ) $reg->include ( 1492204208, [1, 2.5, 6.25 ] ) $reg->include ( 1492291088, [1, 2.5, 6.25 ] ) $reg->include ( 1492377875, [1, 2.5, 6.25 ] ) $reg->include ( 1492464416, [1, 2.5, 6.25 ] ) $reg->include ( 1492551241, [1, 2.5, 6.25 ] ) $reg->include ( 1492623578, [1, 2.5, 6.25 ] ) Results are ... **************************************************************** Regression 'gotsvl2143.dpcmsr1t.TOOLS' **************************************************************** Report.pl::Statistics::Regression:standarderrors: I cannot compute the + theta-covariance matrix for variable 3 0

Cheers,
R.

Pereant, qui ante nos nostra dixerunt!

Replies are listed 'Best First'.
Re^7: using Statistics::Regression
by Anonymous Monk on Apr 19, 2017 at 21:00 UTC
    It's not the amount of change that is the problem, it's the number of distinct values. In your last example, there are only 2 different values of x (2.5 and 3.25), so you can't get more than a linear fit. This is a fundamental limitation of the underlying mathematics.

      Thank you, I see what you mean now.

      I guess I expected to get theta of zero for all the terms but the constant, if there was not enough data to get more than a line fit. I have now updated the code to count the number of unique values and never ask for too many terms. Script now runs happily for all my data. The predictions are still a little wild though, I suspect I may be over-fitting. I don't suppose you saw any undocumented features to add a regularisation term in the code :)

      Cheers,
      R.

      Pereant, qui ante nos nostra dixerunt!