|No such thing as a small change|
Re^2: Time series normalizationby 0xbeef (Hermit)
|on Jul 16, 2009 at 16:04 UTC||Need Help??|
Alas, things are not as simple as this:
My objective commonly would be to produce line graphs for PCPU - a metric that stores the physical CPUs consumed by a server over a given time interval. I would also need to plot the sum of pcpu values for all of the servers on the same graph. The data-series array is acquired similar to this:
So host1 result could contain time/value pairs for 08:01, 08:02, 08:03, 08:04 while host2 result might contain 08:00, 08:01 and 08:04. There are no undefined values to begin with, and to interpolate the missing times with undefined values at the SQL level would very likely make the SELECT even more expensive. And make my head hurt while trying to understand how to do it... :)
I could also easily ensure that there would never be any undef value if I set $percent_overlap_required to 100, but that is unfortunately not realistic in terms of how things work - a server might be offline, or have a slightly different time than another, and as more servers are involved the greater the probability of missing a metric.
After the SELECT, I could build a list that contains just those times/values in common on 80% of the hosts, but there are so many values involved that memory becomes a concern.
My gut feel is that the way forward would involve normalizing the values, but I am struggling to understand how this is done given what I have to start out with.