in reply to Re: Time series normalization
in thread Time series normalization
My objective commonly would be to produce line graphs for PCPU - a metric that stores the physical CPUs consumed by a server over a given time interval. I would also need to plot the sum of pcpu values for all of the servers on the same graph. The data-series array is acquired similar to this:
$common_times = "SELECT timestamp FROM pcpu WHERE (nethost IN ($host_l +ist)) GROUP BY timestamp HAVING (count(*) >= $percent_overlap_require +d)"; $result = $$dbh->selectall_arrayref("SELECT host,timestamp,val FROM pc +pu WHERE (host IN ($host_list) AND timestamp IN ($common_times)) GROU +P BY nethost,timestamp;");
So host1 result could contain time/value pairs for 08:01, 08:02, 08:03, 08:04 while host2 result might contain 08:00, 08:01 and 08:04. There are no undefined values to begin with, and to interpolate the missing times with undefined values at the SQL level would very likely make the SELECT even more expensive. And make my head hurt while trying to understand how to do it... :)
I could also easily ensure that there would never be any undef value if I set $percent_overlap_required to 100, but that is unfortunately not realistic in terms of how things work - a server might be offline, or have a slightly different time than another, and as more servers are involved the greater the probability of missing a metric.
After the SELECT, I could build a list that contains just those times/values in common on 80% of the hosts, but there are so many values involved that memory becomes a concern.
My gut feel is that the way forward would involve normalizing the values, but I am struggling to understand how this is done given what I have to start out with.
Regards,
Niel
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^3: Time series normalization
by jrsimmon (Hermit) on Jul 16, 2009 at 16:34 UTC | |
by 0xbeef (Hermit) on Jul 16, 2009 at 18:37 UTC | |
by jrsimmon (Hermit) on Jul 16, 2009 at 19:00 UTC |