go ahead... be a heretic | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
... Surely the root of all evil, but can't a guy be curious? First a little background, and apologies for the length of this post. I'm working on a legacy web system. When the user (or the server) encounters an exceptional situation, the web server generates an error "ticket" and display a "page not found" page. Reports of these tickets are emailed to a mailing list - it's really handy if we lose a database from the cluster and dozens or hundreds (sometimes thousands!) of users start generating tickets. Unfortunately, we'll also get tickets if web bots are crawling the site, etc. It's been one of my sub-goals to rid our "ticket" system of spurious tickets. Not generating a ticket if the user-agent matched a web bot was easy. We have users and their sessions, and these are tracked through a number of redundant methods, one of which is appending user ID and session information to the URI. The User ID format is very specific - [0-9A-Z]{12}. (Then a dot, then the session ID.) If our system sees lowercase letters, it considers this an exceptional situation, and generates an error ticket. (Not my idea.) This is even though an ID would be valid should it only be translated to using uppercase letters. For most users (and their user-agent) - this isn't an issue. But, it appears as though some user agents attempt to pre- (or re-) fetch oft-visited pages (FunWebProducts springs to mind). Ignore the proper URI, try to fetch using a lowercase ID, ticket gets generated, I get an "error" in my mailbox. Annoying. Should be easy to fix, no? Long story short, the old code did this:
My change was simple:
I test it out on the development server. It works, it doesn't give me a "page not found" error, doesn't generate a ticket - problem solved. I ask my coworker for a code review. My coworker taps me on the shoulder and says "So, we do this tr/// on every ID even though only a tiny subset of them are bad? I clearly have NO idea how expensive that is." To borrow a phrase from bobf, I "went all perlmonks on him". "Blah blah blah premature optimization blah blah micro optimization blah blah blah." I essentially suggested we throw it on the servers and see if it causes a problem. One giant long-running company meeting later, and I'm staring at 20 minutes until it's time to go home for the weekend. Honestly, my curiosity got the better of me and I quickly and quietly setup a little Benchmark. Much to my surprise, my changed version performed better! Around 30% better! Granted, I only performed the operation for only lowercased ID + session strings. Once I realized the folly of my ways and also tested properly uppercased ID + session strings, the results were closer. But still it intrigued me. Here's the benchmark code:
And the output:
From that, I determined that a micro-optimization was possible for the original code - but the results seem fairly clear to me... Performing the transliteration on every ID + session would be a net gain. But why? Adding the following (in the same manner) to the benchmark added to my mystery:
Output:
Now the results match my expectation - performing a transliteration on everything is more expensive, but only if I use temporary lexical variables. Checking B::Concise doesn't shed much light (for me, anyhow):
I see one key difference in 2 <;> nextstate(main 3/4 -e:1 ->3 - is that it? I'm hoping someone with more internals-fu than I can shed some light on this - if only to sate my insatiable curiosity. --chargrill
In reply to Premature and micro optimization... by chargrill
|
|