http://qs321.pair.com?node_id=650488


in reply to how to improve this?

Maybe you find Bio::Grep useful. The fastest solution for your task is probably an enhanced suffix array (for a very short introduction to simple suffix arrays see http://en.wikipedia.org/wiki/Suffix_array). It needs a few minutes (and probably a lot of RAM) to construct them, but then you can search in O(m) (m query length) for exact matches.

Bio::Grep also supports the small tool GUUGle, which isn't that fast but does not require a precalculation and needs less RAM. In addition, it supports GU wobble pairs.

Update: If you don't want to use this module, then you should at least fetch your queries once and then use index or maybe better a regex.