Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

(OT) Don't blow that index dude.

by demerphq (Chancellor)
on Nov 29, 2005 at 09:00 UTC ( [id://512547]=perlmeditation: print w/replies, xml ) Need Help??

This isn't perl, its DB, but it happened to occur in perl code and in the monastery so I feel like I can somehow get away with posting it anyway. Apologies to you rabid "perl only" types out there. :-)

So you've spent some time getting your table sorted. You've indexed the important columns and you feel happy. One of your tables stores data about when an event occured, and you need to often find out things like "events in the past 10 minutes". You've indexed the tstamp field so you can query it efficiently and you write a query like this:

SELECT event_id, tstamp FROM events WHERE UNIX_TIME(tstamp) > UNIX_TIME(NOW())-600

The problem (if you havent already seen it) is that this query "blows the index", meaning that the DB can't use the index it has on tstamp to solve it. Why? Because of the UNIX_TIME(tstamp). The index on tstamp is in some internal date format, most likely not on a unix time. So in order to resolve this query the DB has to apply UNIX_TIME(tstamp) to each value in turn, and then compare it to the constant on the right hand side (most DB's will resolve expressions like this at query compile time, or in other words only once). Now its possible you are using a DB with a smarter optimiser, but there is also a good chance you aren't. A simple rewriting of this as

SELECT event_id, tstamp FROM events WHERE tstamp > DATE_SUB(NOW(),INTERVAL 10 MINUTE)

allows the DB to use the index and resolve the query as efficiently as possible. The moral is that indexes on fields are generally useless in a query when the where clause involves functions on those fields and not the fields compared to constants. That and always check how the DB will resolve your query and make sure its doing it in a way that makes sense. :-)

This code is/was in use in a number of places in the monastery. One example has to scan 400k records when it blows the index, and only 3k when it uses it. So that gives you an idea of the difference in performance.

So, don't blow that index dude. :-)

---
$world=~s/war/peace/g

Replies are listed 'Best First'.
Re: (OT) Don't blow that index dude.
by thor (Priest) on Nov 29, 2005 at 12:25 UTC
    PostgreSQL has a feature where you can index based on an expression. Details here.

    thor

    The only easy day was yesterday

      And, for reference, so does Oracle.

      /J

      True, but here (and in Oracle and other DBs with this feature), it's not magical. That is, you have to decide ahead of time exactly what expression you want to index on and then make your WHERE conditions query on exactly that expression. That said, especially for simple things like case-insensitive comparison (by creating an index on LOWER(somecol) ) this feature can offer an enormous performance benefit in many cases.
        That's the same for /any/ index. The difference between a functional and regular index, is how you specify it. If you create an index on a column, you're creating it on a column. If you are doing it on f(column), you're creating it on f(column).

        You always have to be specific. Can't ask for one thing and expect another. :)

        ----
        Give me strength for today.. I will not talk it away..
        Just for a moment.. It will burn through the clouds.. and shine down on me.

Re: (OT) Don't blow that index dude.
by rob_au (Abbot) on Nov 29, 2005 at 10:25 UTC

    There is a similar, more broad mantra which I have come to employ in some recent database work - Think about how you're actually going to use the data. Spending some time to address this question early in the database design process has allowed for some very significant improvements in some code which I am working on currently. In a similar fashion to demerphq, time stamp entries across a series of database tables in some code that I am working with is now stored in an unsigned integer as seconds since epoch - Furthermore, aided by the MySQL internal commands for address conversion, some network mapping database entries are now stored in long integer format and as such, some large portions of network address calculations in userspace have been able to be removed and replaced by internal MySQL address conversion functions. eg.

     

    perl -le "print unpack'N', pack'B32', '00000000000000000000001000000000'"

Re: (OT) Don't blow that index dude.
by tinita (Parson) on Nov 29, 2005 at 12:08 UTC
    been there, done that =)

    i find the EXPLAIN function in mysql extremely helpful as it can show you which index it will use.
    also, with USING (indexname) you can force mysql to use a specific index if it isn't clever enough to find it out by itself. don't know, which DBMS you're working on.

Re: (OT) Don't blow that index dude.
by VSarkiss (Monsignor) on Nov 29, 2005 at 15:20 UTC

    Just a note that index implementations vary widely from platform to platform. As others have pointed out above, PG and Oracle (among others) implement function indexes. Some others won't use an index on that query at all because of the less-than condition.

    The key thing to note is that indexes are not "magic". Some programmers seem to believe that whenever a column is involved in a query, adding an index will make the query faster. This is rarely true. As in optimizing your Perl code, you should measure, measure, and measure again, then make a choice based on that.

Re: (OT) Don't blow that index dude.
by pileofrogs (Priest) on Nov 30, 2005 at 20:43 UTC

    I'm not much of a DBA. I use MySQL and PG every now and again. Reading the above posts, I had a thought, and I assume I'm wrong, and if you can, I'd love it if someone would explain what's wrong with my thought.

    Achem...

    My Thought...

    What about creating yet another index with the timestamp in the format you need? It seems to me that you'd be trading a bunch of space for exactly the performance you're looking for. It also seems to me that the DBs that allow you to index on a function might just be doing exactly this under the covers.

    Thanks!

    --Pileofrogs

      Sure, thats one of the solutions. If you are going to do all your operations in unix time you might as well store it that way in the first place. The point of the thread is just to bring it to the attention of people that weren't aware of this kind of thing. As somebody else said its important to think about how you are going to use the data.

      ---
      $world=~s/war/peace/g

Re: (OT) Don't blow that index dude.
by Anonymous Monk on Nov 30, 2005 at 20:56 UTC
    So, don't blow that index dude. :-)

    My sex life is none of your business, you jerk! :-(
    ---
    Index Dude

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://512547]
Approved by Corion
Front-paged by friedo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2024-04-20 04:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found