Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

s/Perl/SQL/ ?

by Freezer (Sexton)
on Sep 10, 2012 at 13:40 UTC ( [id://992760]=perlquestion: print w/replies, xml ) Need Help??

Freezer has asked for the wisdom of the Perl Monks concerning the following question:

The SQL below makes sense. However it is extremely slow. Am I right in thinking that a join (MySQL) could be used and that the code could dispense with the embedded Perl?
my $sql_count_A_Ref_vs_Comp = " SELECT '".$Reference."', '".$Comparitor."', A.entity_name FROM e_ann +otation_090812.annotation A WHERE A.user IN ( SELECT T.Line FROM e_annotation_090812.Temp_table T ) AND A.entity_name IN ( SELECT B.entity_name from e_annotation_090812.annotation B WHERE B.user IN ( SELECT U.Line FROM e_annotation_090812.Temp_table U ) AND B.entity_name LIKE '_%' AND B.evidence_code NOT LIKE +'%_________8__' AND B.centre LIKE '".$Reference."' ) AND A.entity_name LIKE '_%' AND A.evidence_code NOT LIKE '%______ +___8__' AND A.centre LIKE '".$Comparitor."' ";

Replies are listed 'Best First'.
Re: s/Perl/SQL/ ?
by Corion (Patriarch) on Sep 10, 2012 at 13:47 UTC

    Where is Perl embedded in your query? What part of constructing the string is slow?

    Maybe you want to talk to your database administrator about the data model and how to best access it? Most databases, and especially MySQL, are bad at doing LIKE (and, without benchmarking, even worse at doing NOT LIKE) queries, because usually there are no indices that handle substring queries and/or postfixes well.

    My rough recommendation, without knowing your data model, is to read Use the Index Luke, and then to set up some benchmarks/EXPLAIN queries to find out how your data model can be improved by indices. Most likely, adding a trigger and separate column for the "8" will improve things.

Re: s/Perl/SQL/ ?
by roboticus (Chancellor) on Sep 10, 2012 at 14:22 UTC

    Freezer:

    A join might be helpful, as you're guessing. Considering that it's using the same conditionals and tables, it would likely be good to refactor the SQL a bit. On first glance, it looks like you're using extremely similar conditions, so you could perhaps change it to:

    SELECT '".$Reference."', '".$Comparitor."', A.entity_name FROM e_annotation_090812.annotation A join e_annotation_090812.Temp_table T on T.Line=A.user where A.entity_name like '_%' and A.evidence_code not like '%_________8__' and (A.centre like '".$Reference."' or A.centre like '".$Comparitor. +"')

    I don't use mysql, but if it's like Oracle, then the "_" is a wildcard character, so effectively the first like expression is merely checking that the string is at least one character long, and the second checks that the string is at least 12 characters long with an 8 as the third-from-last character. The final like clauses could easily be converted to simple comparisons. So you could possibly gain a bit of speed (be sure to benchmark it!) by just doing the explicit checks:

    SELECT '".$Reference."', '".$Comparitor."', A.entity_name FROM e_annotation_090812.annotation A join e_annotation_090812.Temp_table T on T.Line=A.user where length(A.entity_name)>0 and ( length(A.evidence_code)<12 or substring(A.evidence_code,length(A.evidence_code)-2,1)<>'8') ) and (A.centre = '".$Reference."' or A.centre = '".$Comparitor."')

    Finally, to get rid of the trivial amount of embedded perl, you could use placeholders:

    my $ST = $DB->prepare(<<EOSQL); SELECT ?, ?, A.entity_name FROM e_annotation_090812.annotation A join e_annotation_090812.Temp_table T on T.Line=A.user where length(A.entity_name)>0 and ( length(A.evidence_code)<12 or substring(A.evidence_code,length(A.evidence_code)-2,1)<>'8') ) and (A.centre=? or A.centre=?) EOSQL $ST->execute($Reference, $Comparitor, ".$Reference.", ".$Comparitor.") +;

    Update: Changed != to <> and substr to substring Re^4: s/Perl/SQL/ ? as described by Anonymous Monk. I didn't *fully* fix substring, as I've never seen nor tried the form "substring(A.foo from 1 for 2)". I don't doubt that it's correct (I even double-checked the SQL92 and SQL2002 docs referenced at http://savage.net.au/SQL/.) It's just too ugly for me to consider. ;^)

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      This bit of code (update: shown above) looks very interesting. Can anyone convert the SQL easily into MySQL speak? Am I right in thinking that the syntax shown is Oracle?
      my $ST = $DB->prepare(<<EOSQL); SELECT ?, ?, A.entity_name FROM e_annotation_090812.annotation A join e_annotation_090812.Temp_table T on T.Line=A.user where length(A.entity_name)>0 and ( length(A.evidence_code)<12 or substr(A.evidence_code,length(A.evidence_code)-2,1)!='8') ) and (A.centre=? or A.centre=?) EOSQL $ST->execute($Reference, $Comparitor, ".$Reference.", ".$Comparitor.")

        Freezer:

        I used commonly-used[1] SQL constructs, so[2] it should be just fine. I used MySQL about 10 years ago, and I seem to recall that it was fairly standard SQL, so I would expect it to work. (Most of the things I remember as lacking (such as nested queries) were added to MySQL years ago.)

        I only mentioned Oracle to let you know that I'm not current on MySQL and that you *might* need to tweak it.

        However, as Corion mentioned in his initial response, there's no substitute for using indexes, benchmarking and any statement analysis feedback your database offers. After all, how the database decides to execute your SQL statement may be markedly different than what you might imagine. For example, I *expect* that the comparison / length / substring may be faster than like. But I have no idea what MySQL's opinion may turn out to be.

        Update: Updated in response to AM's reply. Thanks! [1] I changed "standard SQL" to "commonly-used SQL", and [2] removed "if MySQL uses standard SQL".

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

        What makes you think that the SQL as shown is specific to Oracle?

        Also, why are you posting nonsensical syntax errors in what looks like it could be meant as Perl code?

        Maybe you want to tell us what syntax errors you actually get?

Re: s/Perl/SQL/ ?
by CountZero (Bishop) on Sep 10, 2012 at 20:05 UTC
    A query that contains LIKE search criteria that start with a wildcard cannot benefit from indexes for that part of the search. The query engine will have to do a slow traversal through the whole of the database (or at least the relevant part of the database, depending on the other search criteria) and do a regex or substring search on that field for each record.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: s/Perl/SQL/ ?
by Anonymous Monk on Sep 10, 2012 at 16:50 UTC
    The EXPLAIN verb is your best friend.   This will tell you exactly how the SQL engine proposes to determine the answers that you seek.   Efficiency is very much a matter of exactly how you say it, and this tangle of subqueries is likely to be profoundly expensive in any system at all.
Re: s/Perl/SQL/ ?
by Neighbour (Friar) on Sep 11, 2012 at 08:37 UTC
    Let's see if I can still do this:
    my $ar_data = $dbh->selectall_arrayref(qq ( SELECT noot.Reference, aap.Comparitor, aap.entity_name FROM ( SELECT ? AS Comparitor, entity_name FROM e_annotation_090812.annotation AS A JOIN e_annotation_090812.Temp_table AS T ON A.user = T.Line WHERE IFNULL(A.entity_name, '') <> '' AND A.evidence_code NOT LIKE '%_________8__' AND A.centre = ? ) AS aap JOIN ( SELECT ? AS Reference, entity_name FROM e_annotation_090812.annotation AS B JOIN e_annotation_090812.Temp_table AS U ON B.user = U.Line WHERE IFNULL(B.entity_name, '') <> '' AND B.evidence_code NOT LIKE '%_________8__' AND B.centre = ? ) AS noot ON aap.entity_name = noot.entity_name ), { Slice => {} }, ".${Comparitor}.", ".${Comparitor}.", ".${Referenc +e}.", ".${Reference}.") or die("Error executing query: " . $dbh->errs +tr);
    Note that this is untested :)
    Feedback is appreciated

    Edit: Could you provide a small portion of (dummy) testdata to work with, as well as the DDL (the result of SHOW CREATE TABLE e_annotation_090812.annotation) of both tables?
    This is handy to find further optimizations (for example, if evidence_code is a numerical field, using LIKE is very bad as it needs to typecast and thus can't use an index if present).

    Edit2: Changed centre LIKE ? to centre = ? which should be better unless you're using wildcards (_ or %) in $Reference or $Comparitor

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://992760]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2024-04-19 10:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found