Re^2: Efficient search through a huge dataset

Replies are listed 'Best First'.
Re^3: Efficient search through a huge dataset by pg (Canon) on Oct 20, 2004 at 02:11 UTC
Millions of records? Not true at all! You can do simply one query, say you want to find out all records that are in table1 but not table2, then you can either: `select blah from table1 t1 where not exists (select 1 from table2 t2 where t1.blah = t2.blah )` [download] Or (if munus is supported): `select blah from table1 minus select blah from table2` [download] It should be easy for you to modify the query a bit to satisfy your needs.	[reply] [d/l] [select]
Re^4: Efficient search through a huge dataset by dragonchild (Archbishop) on Oct 20, 2004 at 02:54 UTC
Faster and better would be outer joins `SELECT t1.blah FROM t1 LEFT OUTER JOIN t2 ON (t1.blah = t2.blah) WHERE t1.blah IS NOT NULL AND t2.blah IS NULL` [download] It's the correct way to rewrite the NOT EXISTS (about 100x faster in my tests) and is faster than the MINUS because it doesn't have to pull both result sets into memory, then do an expensive ARRAYCOMPARE on them. Being right, does not endow the right to be rude; politeness costs nothing. Being unknowing, is not the same as being stupid. Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence. Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.	[reply] [d/l]
Re^5: Efficient search through a huge dataset by Corion (Patriarch) on Oct 20, 2004 at 11:39 UTC
SQLite doesn't do the full outer join thing, so you have to fake it with two `left outer join`s : `-- in table2 but not in table1: SELECT r.id FROM table2 AS r LEFT JOIN table1 AS l ON r.id = l.id WHERE r.system = ? and l.id IS NULL and r.id IS NOT NULL -- in table1 but not in table2: SELECT l.id FROM table2 AS l LEFT JOIN table1 AS r ON l.id = r.id WHERE l.system = ? and l.id IS NOT NULL and r.id IS NULL SQL` [download] But other than that, SQLite is up to the task, as it includes a nice bulk import statement (SQLite 2.x) and a bulk loader program (SQLite 3.x). Update: Added the "full" to "full outer join", as a "left (outer) join" is an outer join too, but one which SQLite does...	[reply] [d/l]
Re^6: Efficient search through a huge dataset by dragonchild (Archbishop) on Oct 20, 2004 at 13:28 UTC


Clear questions and runnable code get the best and fastest answer
	PerlMonks