Data averages by time of day

blue_cowdawg has asked for the wisdom of the Perl Monks concerning the following question:

I have some ideas on how to do what I want, but I thought I would pose this question as I know that some of the monks may come up with better ideas than mine.

The problem

I have collected some data (blood glucose levels) over about a 4 week period of time. The correlation I would like to do is an average of those readings by time of day. For instance if you look at a subset of the data that I have collected I have:

Blood Glucose by Date

Date and Time	Blood Glucose Hg/Ml
2004-02-01 07:01	82
2004-02-01 11:38	172
2004-02-01 22:48	154
2004-02-02 05:38	107
2004-02-02 13:20	117
2004-02-02 23:48	188

As you can see by the subset of data my sample times do not occur at exactly the same times every day depending on my schedule, when I get up, what dumb meetings interrupt my flow, etc.

What I want to do is average these readings by time of day such that I calculate averages for from 4:30AM (when I get up days I go to the gym) all the way through until midnight. Some interpolation is going to be called for since I do not test every hour and my samples are essentially 3 to 4 a day (more when I feel crappy).

Thoughts anyone?

Peter L. Berghold -- Unix Professional Peter at Berghold dot Net
	Dog trainer, dog agility exhibitor, brewer of fine Belgian style ales. Happiness is a warm, tired, contented dog curled up at your side and a good Belgian ale in your chalice.

Comment on Data averages by time of day

Replies are listed 'Best First'.

Re: Data averages by time of day
by Abigail-II (Bishop) on Feb 26, 2004 at 15:39 UTC

You might want to interpolate, but there are many ways of interpolating. A piece-wise linear interpolation of successive datapoints is the simplest, but it's also crude. And even then there are decisions to be made. Suppose you want the average level on 11AM, and you have datapoints on 9:15AM on day 1, 12:30 PM on day1, 8:30AM on day2 and 11:55AM on day2. You can do interpolation between the 9:15AM and 11:55AM times - they are the times closest to 11:00AM, but they are from different days. You can also first interpolate the two points of the first day, then interpolate the two points on the second day, then average.

But is linear interpolation appropriate? Many biometric measurements are supposed to vary during the day, perhaps according to a sinal wave. You can then do a better approximation of the value by taking this into account. You can even increase accuracy by doing interpolation (be it linear or something else) by including more points. But not blindly, it can also decrease the accuracy, specially if you use the "wrong" interpolation.

Alternatively, you can make a "best fit" curve using all the measurements of a day, or even all the measurements. With a best fit curve, you typically first decide what form the curve should have (a line, a parabole, a sinal wave, etc), and then from all the possible curves of that form, you find the one that minimizes some distance from your measure point. An often used distance is the sum of the squares of the distances of all the measured points to the curve. There are standard techniques for this.

4:30AM (when I get up days I go to the gym)

Abigail