http://qs321.pair.com?node_id=1035349


in reply to How do I synchronize data to common timestamp

By no means an answer, but just food for thought. One thing you might consider is to just keep the timestamps as they are not worry so much about normalizing them up front.

I was coming up with a way of modeling stats for authenticated smtp spam and was trying to get a breakdown of hourly counts per relay ip and per customer email address. It ended up being much simpler to just store the raw values (epoch timestamp, ip, email) in a mysql table and then put some of the heavy lifting on the sql side.

The date handling you can use in mysql is quite good. I used statements like below to give me a very simple-to-process array of data on which to report.

So, to give myself the last the day's worth of data broken down by hour, I would do something like this:

SELECT timestamp, CONVERT_TZ(FROM_UNIXTIME(timestamp - timestamp % 3600 ), '+00:00', + '-5:00') AS Hourstamp, CONVERT_TZ(FROM_UNIXTIME(timestamp), '+00:00', '-5:00') AS 'Datest +ampEDT', FROM_UNIXTIME(timestamp) AS 'Datestamp', relay, email FROM relay_report WHERE FROM_UNIXTIME(timestamp) >= DATE_SUB(NOW(), INTERVAL 23 HOUR)

The imporant bit is the modulus math. This would give me the rows with Hourstamp being the time at the top of the current row's hour, the raw datestamp in Eastern time (yes, I eventually got around to installing the time zones in mysql and fixed this...), the raw datestamp (all our infrastructure uses GMT), and then the relay ip and email address. I could then easily group things by that hour stamp and have a nice breakdown of hourly rates. I had other things I wanted to do with the data while I was processing it which is why I didn't just do a GROUP BY right in the sql statement.

My primary goal was to come up with a way of detecting compromised customer accounts that were being used to relay spam via authenticated smtp, but I also was giving other departments a way to look at these stats. Doing it this way proved ridiculously easy to just run Excel with the odbc connector to pull data directly, because it was then trivial to create a quick pivot table and a few graphs and charts that were good for a few oohs and aahs on the conference room projector. (Plus date and time stamp handling in Excel is so effing difficult I half expected the Excel elites in there to hold me down with a blanket and pummel me with thick manuals because I'd handled the timestamps perfectly before the data ever got to Excel.)

Given your raw sample data, I'd probably do something similar. Start stuffing your epoch timestamp, key, and value into db rows as-is. Then select the info you want using a similar method to what I demonstrated. Play with the modulus math if you want to group by smaller increments, 60 seconds or whatever.

--
Andy

Replies are listed 'Best First'.
Re^2: How do I synchronize data to common timestamp
by lewnewby (Novice) on May 28, 2013 at 17:03 UTC

    Ok thanks for the thoughts guys.

    I could probably upload my scripts and associated modules but I can't upload the SDK here without all sorts of legal hoop issues so unless you have that and a NetApp environment it would be hard to test. I'll keep plodding away and see what I can manage with Data::Dumper