Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^3: Efficient search through a huge dataset

by pg (Canon)
on Oct 20, 2004 at 02:11 UTC ( [id://400738]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Efficient search through a huge dataset
in thread Efficient search through a huge dataset

Millions of records? Not true at all!

You can do simply one query, say you want to find out all records that are in table1 but not table2, then you can either:

select blah from table1 t1 where not exists (select 1 from table2 t2 where t1.blah = t2.blah )

Or (if munus is supported):

select blah from table1 minus select blah from table2

It should be easy for you to modify the query a bit to satisfy your needs.

Replies are listed 'Best First'.
Re^4: Efficient search through a huge dataset
by dragonchild (Archbishop) on Oct 20, 2004 at 02:54 UTC
    Faster and better would be outer joins
    SELECT t1.blah FROM t1 LEFT OUTER JOIN t2 ON (t1.blah = t2.blah) WHERE t1.blah IS NOT NULL AND t2.blah IS NULL

    It's the correct way to rewrite the NOT EXISTS (about 100x faster in my tests) and is faster than the MINUS because it doesn't have to pull both result sets into memory, then do an expensive ARRAYCOMPARE on them.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      SQLite doesn't do the full outer join thing, so you have to fake it with two left outer joins :

      -- in table2 but not in table1: SELECT r.id FROM table2 AS r LEFT JOIN table1 AS l ON r.id = l.id WHERE r.system = ? and l.id IS NULL and r.id IS NOT NULL -- in table1 but not in table2: SELECT l.id FROM table2 AS l LEFT JOIN table1 AS r ON l.id = r.id WHERE l.system = ? and l.id IS NOT NULL and r.id IS NULL SQL

      But other than that, SQLite is up to the task, as it includes a nice bulk import statement (SQLite 2.x) and a bulk loader program (SQLite 3.x).

      Update: Added the "full" to "full outer join", as a "left (outer) join" is an outer join too, but one which SQLite does...

        Umm ... Corion? That is an outer join. SQLite just doesn't require the OUTER keyword. :-)

        Being right, does not endow the right to be rude; politeness costs nothing.
        Being unknowing, is not the same as being stupid.
        Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
        Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://400738]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2024-04-19 09:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found