Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Re: Re: Adventures in optimization (or: the good and bad side of being truly bored)

by demerphq (Chancellor)
on Aug 04, 2003 at 07:54 UTC ( [id://280587]=note: print w/replies, xml ) Need Help??


in reply to Re: Re: Adventures in optimization (or: the good and bad side of being truly bored)
in thread Adventures in optimization (or: the good and bad side of being truly bored)

on further investigation... 10s of thousands more .. calls

Are you sure? I thought the cache worked like this

$cache{$yearmonth}+$days*(24*60*60)+$hours*(60*60)+$mins*60+$secs;

Having said that, I'm in the odd position that I too didn't realize the nocheck option in Time::Local, and I also wrote my own caching for it, but I did it based on hour. I am parsing dates like "20030401101123" and i endup doing something like the following (time_to_unix is just a wrapper around timelocal that knows how to split the above string (fragment) correctly)

($cache{substr($date,0,10)}||=time_to_unix(substr($date,0,10))) + substr($date,10,2)*60 + substr($date,12,2);

which also gave me a several thousandfold time increase in my time calculations. Incidentally I think this approach will probably signifigantly outperform using timelocal() (and its caching) directly. The hash lookup on the first section of the date is far cheaper than splitting the date and then passing all of its parts on the stack, having timelocal do its checks and caching, which presumably resemble

$cache{$year.$month}

anyway, and then getting the results back over the stack. We trade many ops for just a few. And we get a cool bonus, since Time::Local is still validating its input the cache actually acts as a validating filter too. Only valid YMDH's get into it, and if we dont have a hit we either have an unknown valid YMDH or a bad date. Both of which Time::Local handles for us. So we get a serious speed benefit without losing any of the safety of Time::Local.


---
demerphq

<Elian> And I do take a kind of perverse pleasure in having an OO assembly language...

Replies are listed 'Best First'.
Re: Re: Re: Re: Adventures in optimization (or: the good and bad side of being truly bored)
by revdiablo (Prior) on Aug 04, 2003 at 18:40 UTC

    demerphq++. Thanks for the reply. I did subsequently benchmark Time::Local with _nocheck, and while it was faster than without _nocheck, my home-brew cache was still substantially faster. Interesting that you decided to cache at the hour level, rather than the day. I chose the day level because converting hours to seconds is a relatively trivial calculation, but then again I guess converting days to seconds is too, so maybe caching at the month level would be just as good.

    Now I wonder if the different caching level is the reason _nocheck is slower. Perhaps it's due to the additional subroutine call, and not the different caching at all. But again this is all rank speculation... (I'm actively resisting the urge to break out my benchmark.pl and test hour, day, and month-level caching, but I think I need to just be happy with the performance I've got.)

    PS: Based on your reply here and to my post about moving averages, I have to wonder if you're not doing something relatively similar? Hopefully my posts have been somewhat helpful to you, but more likely it seems that your posts have been more helpful to me. ;)

    Update: Just thought I might clarify a bit:

    on further investigation... 10s of thousands more .. calls

    Are you sure? I thought the cache worked like this ...

    I meant 10s of thousands more calls to timelocal. Your example is essentially how my cache works (though there are a few things I notice that would probably make it a touch quicker than mine). My log has 10s of thousands of entries between each unique day (an entry every 5 seconds, to be precise), so using Perl's math operations instead of a call to timelocal for all those entries is a huge win.

      Interesting that you decided to cache at the hour level, rather than the day.

      I chose the hour and not the day because it ended up producing a "reasonable" number of entries in the cache, as I typically deal with data spread over 30 days my cache is usually around 720 entires. If you are dealing with times that are spread over only a day then I would suggest you go to the minute level of resolution, which would mean a cache around 1440 entries (both numbers are actually low when you factor in the behaviour of perls hashes and the amount of space actually used up).

      As for the analysis side of it I think its pretty clear. We are both manipulating strings. A hash lookup on a string of the sizes we are dealing with is far less work than dissescting the string into the required sizes and order (and perhaps supplying additional values) pushing them onto the stack, having timelocal pull them off the stack, build a fragment that it can use to check its cache and return the value over stack again. If you add it up its probably 4 or 5 times more operations (depending on how you defined the term) to call the subroutine, which in both of our cases will most likely be for a time we have already encountered.


      ---
      demerphq

      <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://280587]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-03-28 20:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found