http://qs321.pair.com?node_id=174709

Every few days somebody wanders in with a DBI problem involving hand-built queries. And just as often, someone will respond with the suggestion that they use bind parameters. This is almost always good advice, though it sets up a landmine for the unlucky soul who tries to bind a value for a LIKE clause.

Here's the problem: Parameters in SQL queries let you do two things. First, they let you avoid the details of quoting string values*, and avoid having to escape any characters within a string that the RDMBS wants to have escaped. This lets you avoid the problem of doing   $sth->prepare("SELECT * FROM companies WHERE name = '$name'"); where $name happens to be "O'Reilly". Oops. Malformed query.

Rewriting this as   $sth->prepare("SELECT * FROM companies WHERE name = ?"); lets DBI/DBD handle the quoting when a value for name is supplied at execute() time. You don't have to worry about the details, including portability details in the event that your code is set down atop a different RDBMS with different quoting rules.

The second benefit is performance, if you happen to be using a database that does its query parsing and execution at prepare() time. (Oracle does, MySQL doesn't (yet)). It takes time for a database to parse a query and figure out how to handle it (depending on the query, indexes can be used or not, and the order in which tables are scanned can matter). If you can take this hit once, you're almost always better off.

Here's the landmine.

Say you've coded up a search page that allows the user to specify the start of a name. You'll report on all matches. Instead of writing   $sth->prepare("SELECT name, url FROM table WHERE name LIKE '$name%'); you've taking the standard advice, and have instead written

$sth->prepare("SELECT name, url FROM table WHERE name LIKE ?"); $sth->execute($name . '%');
Knowing a bit about databases, you've built a separate index on table.name so that the search will go fast. This seems to work fine when you test, and so you ship. Eventually a customer gripes that your system is really slooooow. So slow that their browsers are timing out. They paid a bunch of money for Oracle, and (hopefully?) a bunch of money for your software, and are very, very unhappy.

What's going on?

Eventually a DBA looks at the query plan that Oracle is using for your query, and notices that it is peforming a linear scan of table, which, for this customer, is Very Big. You didn't notice this because your test data wasn't anywhere near as large (a problem worth its own rant). But why the linear scan? You have an index on the table.name. What's going on?

The problem is this: When the Oracle prepared the query and examined the LIKE clause, it could tell that there was an index on table.name. If Oracle had 'smith%' in hand at that point, it would have determined that the index was safe to use ('smith' could be used as a partial key). But without the argument in hand, the Oracle's query planner had to make a worst-case assumption. Worse case means a linear scan. Linears scans of large data tables suck.

If you'd built the query by hand, Oracle would have used your index, and the query would have been quick. But, since you took the standard advice and used bind parameters, performance sucked.

Now you know.

*crazyinsomniac reminds me that you can use DBI::quote() when hand building queries.

Replies are listed 'Best First'.
Re: Where the advice to use DBI bind parameters can go wrong (long)
by mpeppler (Vicar) on Jun 14, 2002 at 22:58 UTC
    Interesting observation.

    For Sybase the optimizer determines index usage when the query is run the first time. So if the first request only has a trailing '%' and the column has an index it will be used. A subsequent query with a leading '%' won't be incorrect - the optimizer detects the leading '%' and while it uses the index it will look at all the values for a match.

    However, if the first query has a leading '%' then you will get a table scan each time.

    I ran the following:

    my $sth = $dbh->prepare("select * from market where commodity like ?") +; $sth->execute("SPM%"); while(my $d = $sth->fetch) { ; } $sth->execute("%XXX%"); while(my $d = $sth->fetch) { ; }
    The showplan output shows that the index is used, but the stats IO shows that the second execute() call does a lot more work:
    Table: market scan count 1, logical reads: (regular=111 apf=0 total=11 +1), physical reads: (regular=0 apf=0 total=0), apf IOs used=0
    vs.
    Table: market scan count 1, logical reads: (regular=51889 apf=0 total= +51889), physical reads: (regular=0 apf=0 total=0), apf IOs used=0

    FWIW :-)

    Michael

Re: Where the advice to use DBI bind parameters can go wrong (long)
by perrin (Chancellor) on Jun 14, 2002 at 20:14 UTC
    There's probably a way to give Oracle an optimizer hint that would take care of this. I've seen DBAs use optimizer hints to make Oracle use certain indexes before.
      There's probably a way to give Oracle an optimizer hint that would take care of this. I've seen DBAs use optimizer hints to make Oracle use certain indexes before.

      Not in this case. Hints influence how a query gets executed, but not in ways (as far as I know) that would result in the answer being wrong. Honoring a "trust me, you can use an index" hint would result in a bogus result set if the value bound for a LIKE clause had a leading wildcard.

      Hints are great when a query plan is joining tables in a reasonable, but suboptimal, sequence. Any way you join should give you the same result set.

        We ran into this problem with some of our code. When our DBA talked with Oracle Tech Support, they said to use an index hint. The Oracle tech said that an index would then be used, unless the parameter started with wildcard. At that time it would recompile the query to not use the index. Oracle would then keep both versions of the query cached and would use the appropriate one. The hints are just that, hints. If it doesn't work then Oracle ignores them.
Re: Where the advice to use DBI bind parameters can go wrong (long)
by ignatz (Vicar) on Jun 14, 2002 at 20:44 UTC
    That's an interesting landmine. I doubt that I ever would have thought of it.

    I wonder how much pain is behind this node. I can almost feel the hours shaking one's fist at a monitor. Thanks for helping me avoid it.

    ()-()
     \"/
      `                                                     
    
Re: Where the advice to use DBI bind parameters can go wrong (long)
by kappa (Chaplain) on Jun 15, 2002 at 11:46 UTC
    The cool dragonfly book thoroughly describes the case of optimizer failure you mention. I generally advise the book to all database programmers whether they use Oracle or not.
Re: Where the advice to use DBI bind parameters can go wrong (long)
by hacker (Priest) on Jun 17, 2002 at 14:05 UTC
    For the record, I've never seen a mysql query with that syntax in production code (though this thread isn't limited to MySQL, I'm sure there are similar syntaxes in other RDBMS'). Normally I see the following:
    $sth->prepare(qq{ SELECT name, url FROM table WHERE name LIKE %$name%});

    This may not do what you expect:

    $sth->prepare("SELECT * FROM companies WHERE name = ?");

    I've been burned by the double-quotes in '$sth->prepare' also, so I have refrained from using them, especially with regard to their use with binds. In fact, your second example will not work with those bind parameters in double-quotes. All of the DBI books I've seen refer to qq// in these circumstances.

      hacker, the second example doesn't work because of the %$name being interpolated inside the double quotes. Single quotes is a saves a bit of typing over qq{}.

      As a solution to the initial problem pointed out by dws the work-around in Oracle is to use InterMedia indexes for full text searches of VARCHAR2 and LOB data. They can be a bit tricky to set up initially but do work well when you get them going (make sure you monitor the CTX server process).

      rdfield

Re: Where the advice to use DBI bind parameters can go wrong (long)
by demerphq (Chancellor) on Jun 17, 2002 at 13:45 UTC
    Note that when using Sybase binding columns often isnt the best plan at all.

    mpeppler has an article about it somewhere here.

    Yves / DeMerphq
    ---
    Writing a good benchmark isnt as easy as it might look.