|Problems? Is your data what you think it is?|
Time series normalizationby 0xbeef (Hermit)
|on Jul 16, 2009 at 14:29 UTC||Need Help??|
0xbeef has asked for the wisdom of the Perl Monks concerning the following question:
Be warned in advance, I have no education in statistics and might be using the wrong terminology... :)
I store system performance metrics as a (epoch) time-series in a SQLite database which may have missing data-points at a given time. It could be simplified like this:
I'm using GD::Graph to plot each individual data-set, and in some cases along with the summed values in a line graph. The problem is that just feeding the Val2 series as-is to GD will result in a shorter line graph over time than the Val1 series, with the actual values not corresponding to the correct time on the x-axis.
The way I have it now is to do a complex SELECT to get values for just the common time points and feed that to GD, but it is quite inaccurate and it is slow due to the number of data-points involved. So some sort of mathematical transformation could probably work better (pity I know very little about them).
I'm guessing I need to normalize each data-set across the overall time-range, and do a linear interpolation for each missing value, but the normalization is a bit beyond me.
It seems Data::TimeSeries could do something like this, but my data times could be a granular as a few minutes and this module only seems to support HOURS as a period.
I've read a bit about RRDTool and it sounds like it might be a great alternative to using a database altogether especially in reducing disk space usage, but to rewrite my code seems a bit more involved than what I prefer right now.
Does anyone perhaps know of another fairly efficient way to normalise data like this?