Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

By no means an answer, but just food for thought. One thing you might consider is to just keep the timestamps as they are not worry so much about normalizing them up front.

I was coming up with a way of modeling stats for authenticated smtp spam and was trying to get a breakdown of hourly counts per relay ip and per customer email address. It ended up being much simpler to just store the raw values (epoch timestamp, ip, email) in a mysql table and then put some of the heavy lifting on the sql side.

The date handling you can use in mysql is quite good. I used statements like below to give me a very simple-to-process array of data on which to report.

So, to give myself the last the day's worth of data broken down by hour, I would do something like this:

SELECT timestamp, CONVERT_TZ(FROM_UNIXTIME(timestamp - timestamp % 3600 ), '+00:00', + '-5:00') AS Hourstamp, CONVERT_TZ(FROM_UNIXTIME(timestamp), '+00:00', '-5:00') AS 'Datest +ampEDT', FROM_UNIXTIME(timestamp) AS 'Datestamp', relay, email FROM relay_report WHERE FROM_UNIXTIME(timestamp) >= DATE_SUB(NOW(), INTERVAL 23 HOUR)

The imporant bit is the modulus math. This would give me the rows with Hourstamp being the time at the top of the current row's hour, the raw datestamp in Eastern time (yes, I eventually got around to installing the time zones in mysql and fixed this...), the raw datestamp (all our infrastructure uses GMT), and then the relay ip and email address. I could then easily group things by that hour stamp and have a nice breakdown of hourly rates. I had other things I wanted to do with the data while I was processing it which is why I didn't just do a GROUP BY right in the sql statement.

My primary goal was to come up with a way of detecting compromised customer accounts that were being used to relay spam via authenticated smtp, but I also was giving other departments a way to look at these stats. Doing it this way proved ridiculously easy to just run Excel with the odbc connector to pull data directly, because it was then trivial to create a quick pivot table and a few graphs and charts that were good for a few oohs and aahs on the conference room projector. (Plus date and time stamp handling in Excel is so effing difficult I half expected the Excel elites in there to hold me down with a blanket and pummel me with thick manuals because I'd handled the timestamps perfectly before the data ever got to Excel.)

Given your raw sample data, I'd probably do something similar. Start stuffing your epoch timestamp, key, and value into db rows as-is. Then select the info you want using a similar method to what I demonstrated. Play with the modulus math if you want to group by smaller increments, 60 seconds or whatever.


In reply to Re: How do I synchronize data to common timestamp by naChoZ
in thread How do I synchronize data to common timestamp by lewnewby

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (3)
As of 2021-12-02 18:52 GMT
Find Nodes?
    Voting Booth?
    R or B?

    Results (23 votes). Check out past polls.