http://qs321.pair.com?node_id=742277

morgon has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

assume I have a large collection of strings (let's say a million of them) each associated with a timestamp.

I now want to be able to query this collection for all strings matching a given regex, possibly constrained by upper and/or lower limits on the associated timestamp, so e.g. a query would be "find all strings matching /abc.*/", another one would be "find all strings matching /x*y/ where the associated timestamps are of last week".

Evidently I could put all the data into a database and use SQL for the queries but I wonder if there is a good algorithm to build a suitable index for such queries and do all the querying in pure perl - in such a way of course that answering a query should not take more than a few seconds.

If building an index that supports arbirary regexes is too difficult I could make do with shell-style globbing.

Any ideas?