Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Does DBI handle table alias?

by punch_card_don (Curate)
on Oct 06, 2004 at 16:02 UTC ( #397050=perlquestion: print w/replies, xml ) Need Help??

punch_card_don has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

This query:

$sql = "SELECT COUNT(*) FROM theTable A INNER JOIN theTable B ON (A.co +lumn_1 = $value_x and B.column_1 = $value_y AND A.column_2 = B.column +_2)";

hangs up until, it appears, the browser times out and reports a server error.

I wonder if it's the table aliases, or something else DBI doesn't like?

Thanks

janitored by ybiC: Balanced <code> tags around query

Replies are listed 'Best First'.
Re: Does DBI handle table alias?
by VSarkiss (Monsignor) on Oct 06, 2004 at 16:13 UTC

    It's highly, highly unlikely that DBI is the culprit. DBI doesn't interpret the SQL you pass through.

    Does your query work if you run it directly against the database (using mysql or sqlplus or whatever command-line tool applies to the platform you're using)? If the tables are huge, your query could take a very long time to process.

    Update
    Just as an aside, it's stylistically nicer to put your search conditions separately from your join criteria, such as:

    SELECT COUNT(*) FROM theTable A INNER JOIN theTable B ON A.column_2 = B.column_2 WHERE A.column_1 = ? AND B.column_1 = ?
    Also, notice I changed your string interpolations to placeholders. If your driver supports it, placeholders are definitely the way to go.

Re: Does DBI handle table alias?
by dragonchild (Archbishop) on Oct 06, 2004 at 17:11 UTC
    The prior responses, especially VSarkiss's, are absolutely correct. You will want to verify this query from whatever commandline tools your RDBMS provides. A few thoughts on how to speed up a query:
    • Verify that you have indices on the columns you're searching on. Also, you will want to verify that the indices are being used. DBD::mysql, for example, passes all parameters in quoted, so they all look as if they are strings to MySQL. This means that comparisons to numeric columns will not use any indices, even if they are available.
    • Put your search conditions in a WHERE clause, very much like VSarkiss suggested in his update. This allows the RDBMS to optimize the query better.
    • Depending on the size of your tables and the size of the machine, some queries might just run slowly. If you're working with 1G tables and you have 256M of RAM, you're going to have issues.

    Also, you might want to check the value of your webserver timeout. In Apache, the variable is called Timeout and is found in the httpd.conf. (I don't know about others, like IIS.) Some queries will simply take 1-2 minutes and if your timeout is set to 90 seconds (which isn't uncommon), you're going to have issues.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Does DBI handle table alias?
by jfroebe (Parson) on Oct 06, 2004 at 16:53 UTC

    Hi,

    The problem has absolutely nothing to do with DBI. The issue is a performance one with your chosen dbms. If you run the same query using the supplied query tool provided by your vendor (i.e. isql for Sybase ASE), you will see the same problem... namely the query is taking a long time to execute. Your browser or the webserver is timing out on your perl script which is still waiting for your query to return.

    How to fix? Depends on your DBMS. Do you have a covering index? Are the SARGS (search arguments) causing a table/index scan or what? This basically becomes a DBMS specific issue as to how to determine the real culprit. Check your dbms documentation

    update: DBI has no knowledge of an aliased table name as it just passes it to your dbms. I know of no DBD that will interfere with the use of an aliased table name (if the DBMS supports it).

    You will probably want to make sure the indexes are being used on the joining columns....

    Jason L. Froebe

    No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil, Stargate SG-1

Re: Does DBI handle table alias?
by jaco (Pilgrim) on Oct 06, 2004 at 16:20 UTC
    I'm going to agree. I've passed DBI things such as
    $yesterday = $ebh->prepare("select d1.nas_ip as d1ip, d1.date as d1dat +e, d2.nas_ip d2ip, d2.date as d2date from scratch as d1 left join sc +ratch1 as d2 on d1.nas_ip=d2.nas_ip");
    On tables over 1 gig and never had a problem with it. I'd look into the time
    it takes to execute, just like VSarkiss suggested
Re: Does DBI handle table alias?
by punch_card_don (Curate) on Oct 06, 2004 at 18:16 UTC
    Thanks, all, for the help & direction.

    I'm trying to put it into practice. Here's the situation:

    I have ~20,000 users, each reporting an average of 600 characteristics out of a possible 3,000 characteristics. The objective of this system is to count users who correspond to various collections of characteristics, in the general form:

    count all users who reported characteristics a, b &c

    I've created a two-column databse that looks like this:

    characteristic_id (int), user_id(int)

    with no indexes to start, as a baseline. So, for the average user, there are 600 rows, that makes 12-million rows in total.

    Following your suggestions, I connected by telnet to execute a few queries.

    select count(*) from theTable
    correctly reports 12,000,000 rows in 0.00s

    select count(*) from theTable where characteristic_id = 123
    reports in 0.02s

    At this point I'm thinking this is really great. But then

    select count(*) from theTable where user_id = 456
    reports in 10.77s

    And then applying VSarkiss' excellent style:

    select count(*) from theTable A inner join theTable B on A.user_id = B.user_id where A.characteristic_id = 123 and B.characteristic_id = 124
    reports in 3min 23s!

    So - in effect, nothing to do with Perl.

    But a new problem. It takes several dozen such queries to compile the report data I need. Clearly 3-minutes per query is unacceptable.

    Personally, I'm amazed by the magnitude of the increase in response time.

    Sooooooo - dare I hope to see a two-orders-of-magnitude decrease in response time by adding indexes? Or do you think my database design is fubar?

      You should investigate what kind query analysis tools your DBMS has. There should be a tool that will tell you what type of execution path your DBMS is planning for your query. Generally, these will help you determine where indexes are useful.

      If you have absolutely no indexes then for every query you are doing a full table scan instead of an index scan. So, adding a n index will improve performance dramatically. Most likely by at LEAST an order of magnitude and possibly even more than that.

        You should investigate what kind query analysis tools your DBMS has.

        In particular, check your RDBMS docs for "EXPLAIN PLAN". It's how you get the database to cough up the query execution plan, which will tell you things like where the database is having to do full table scans to satisfy your query. This'll give you suggestions about where indexes might help.

      If you add indices, you will see response times drop from 3min23s to under 10s, and probably closer to 1s. I have seen it happen. In other words, try it and see!

      Being right, does not endow the right to be rude; politeness costs nothing.
      Being unknowing, is not the same as being stupid.
      Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
      Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: Does DBI handle table alias?
by punch_card_don (Curate) on Oct 06, 2004 at 23:41 UTC
    Fortunately, those are the only two columns in the table, AND there will practically never be deletes or insertions. Once the data is loaded into the database, it never changes unless someone finds an error. So it looks like I'm on the right track.

    BUT - I launched

    create index col_1_idx on theTable(column_1)

    over two hours ago, and it's still chunking away.....

      Actually, it had completed and disconnected and I was too bleary-eyed to notice when I looked at the screen in passing...
Re: Does DBI handle table alias?
by punch_card_don (Curate) on Oct 06, 2004 at 19:54 UTC
    OK - this is going to be an interesting experiment. I'm in the process of adding indeices. I'm going to add different combinations of different kinds of indices and test each one. Will keep this topic posted.
      Be careful... you really don't want to create anymore indexes than necessary. Even though you might see a performance increase in your test by building a bunch of extra indexes you might end up causing performance problems in other parts of the system. Remember, that whenever a rows are inserted or deleted there is an additional work that has to be for each index. Just like everything else in programming you have to strike a balance.

      Looking at the query you had problems with I am guessing that the only indexes that are going to help are to build an index for user_id and characteristic_id for both tables. Generally, columns that appear in join conditions are good candidates for building an index.

Re: Does DBI handle table alias?
by punch_card_don (Curate) on Oct 07, 2004 at 19:16 UTC
    A new installement in the saga:

    I followed your advice and created indexes, one at a time.

    Having an index on each of the columns has reduced response time for the query:

    select count(*) from theTable A inner join theTable B where A.column_1 = 123 and B_column_1 = 456 and A.column_2 = B.column_2;

    to 30-seconds. A seven fold improvement over 3 1/2 minutes to be sure, but having dozens of these to run while an analyst waits for his page of data, noone's going to wait the 20-minutes or so that will take. The objectives call for 5 to 10 seconds to generate the whole page. I might get away with 15.

    Now, these were just plain indexes. Neither column has a clustered index on it, so that'll be the next thing I try, but I somehow doubt even a clustered index is going to reduce my time to the ~0.5 seconds I'm looking for.

    Any and all advice will be appreciated. Thanks.

      I am not exactly sure what you mean by a plain index, but I am pretty sure that adding a clustering index on a field that is already indexed will not really accomplish anything. I could be confusing myself though.

      At this point I think there are two places to looking. First, run the query through whatever explain utilities are available on your DBMS. This may give you some insight into what other optimizations may be useful. This was suggested in a previous post, but may have gotten lost in the shuffle. The other thing to consider is if you have any sort of hardware limitation. E.g. are you running out of RAM on that machine?

      After that, you are going to want to look at if there any database specific options/features you can turn on that may help your query. This type of thing is inherently database specific, and each database usually has a slew of different things you can try. You probably want to find some good perfomance tuning books/references for your database.

      I am a DBA neophyte, so I do not really know if your performance requirements of .5s are reasonable. I will say that 30s for that query sounds like a pretty long time to me.

Re: Does DBI handle table alias?
by runrig (Abbot) on Oct 07, 2004 at 19:47 UTC
    Just an idea. Are there any duplicate rows in the table? If not, how about, e.g.:
    select column_1 from theTable where column_2 in (123, 124) group by column_1 having count(*) = 2
    I am not sure an index would even help in this case, but I'd at least try one on column_2.

    Update: As indicated by tall_man below, the effectiveness of the index will depend on the distribution of the data in column_2.

      I don't know enough about the sorts of requests that might be made against your database, but

      1. if using MySQL 4+, have you turned on the query caching? and

      2. can you pre-populate or generate any reports off-line? i.e. if you know the reports that most users ask for, can you pre-build them at 1-6 a.m. (or whatever), when it does not matter if each query takes three minutes.

      Even running the queries will build the query cache for number 1.

      Finding the top 20% of the reports used 80% of the time means you can pre-build those or prime the cache.

      It all depends on your data and how dynamic the queries are.
      Another idea that might be worth trying is a subquery:
      select count(*) from theTable where column1 in (select column1 from theTable where column2 = 123) and column2 = 124
      But the other suggestion would probably scale better for bigger groups of attributes.

      Update: By the way, practically everyone who answered is giving you the wrong impression about indexes. Sometimes sequential scans are faster than using indexes, for example if the attribute being tested is a match for a large percentage of the records. That's what the query optimizer is supposed to do on a query-by-query basis: decide whether to use the indexes or not, and in what sequence. In the subquery I suggested above, the optimizer might not be smart enough to do it, so you might have to create the query such that the smallest subset will be selected by the inner query.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://397050]
Approved by Arunbear
Front-paged by htoug
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (5)
As of 2022-06-28 00:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My most frequent journeys are powered by:









    Results (89 votes). Check out past polls.

    Notices?