I put together an example in case you want to use PostgreSQL:
The file I used is available here:
ftp://ftp.ncbi.nih.gov/genbank/livelists
It's similar to yours; but it has three columns.
I unzipped it, and put it into postgres, in a table t; there are more than 223-million rows.
$ ls -lh GbAccList.0206.2011
-rw-rw-r-- 1 aardvark aardvark 4.6G Feb 8 17:21 GbAccList.0206.2011
$ head -n 3 GbAccList.0206.2011
AACY024124353,1,129566152
AACY024124495,1,129566175
AACY024124494,1,129566176
$ time < GbAccList.0206.2011 psql -qc "
create table t (c text, i1 integer, i2 integer);
copy t from stdin csv delimiter E',';"
real 3m47.448s
$ time echo "
create index t_i2_idx on t (i2);
analyze t;" | psql -q
real 5m50.291s
Searches are now around a tenth of a millisecond:
# 5 'random' searches like:
echo "explain analyze
select * from t
where i2 = $gi;" | psql
Just showing the timings of five searches:
Index Cond: (i2 = 2017697) Total runtime: 0.157 ms
Index Cond: (i2 = 6895719) Total runtime: 0.109 ms
Index Cond: (i2 = 3193323) Total runtime: 0.119 ms
Index Cond: (i2 = 8319666) Total runtime: 0.091 ms
Index Cond: (i2 = 1573171) Total runtime: 0.119 ms
Of course, performance depends on the hardware used.
(a similar problem/solution here: Re^3: sorting very large text files (slander))
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.