Re^4: Time series normalization

I am using your method (except for formatting differences as per what GD::Graph wants) for daily graphs, but the graphs I am referring to here is for a long term trend per managed system.

Each managed system could easily contain 20 logical partitions, and for a 3 month trend it could be about 3500 - 5000 values per LPAR.

Using the "select just the 100% common times" method takes about 30 odd seconds to do such a graph for nearly 20 members of the managed system, but to get this result took quite a bit of SQLite3 tuning.

The really big problem with just inserting undefs would be the number of samples. If even just one server was set to gather stats at a short interval, each of the other servers involved running at a different interval would now have to include extra empty values.

I would therefore be inclined to discard the times for which less than x % of hosts have values, but is this the best solution? Since I have the number of samples per data-series, is there no way to fit each data-series between a start and end time using some sort of approximation or mathematical transform?

Niel

Comment on Re^4: Time series normalization

Replies are listed 'Best First'.

Re^5: Time series normalization
by jrsimmon (Hermit) on Jul 16, 2009 at 19:00 UTC

I'm still not convinced that you're not making the problem harder than it should be. A select that has to compare all entries for one key vs all entries of every other key, even if the db is indexed by that key, is fairly intensive. It doesn't surprise me that it required some tuning to get the time down. Simply populating a hash, though, with the values you get and undef as placeholders, should be quite efficient.

That said, you might check Chart::Plot to see if it will do what you want. It does not require uniform length data sets, per the doc.

[reply]


Your skill will accomplish what the force of many cannot
	PerlMonks