|There's more than one way to do things|
Creating an index on a string-collectionby morgon (Priest)
|on Feb 08, 2009 at 18:47 UTC||Need Help??|
morgon has asked for the wisdom of the Perl Monks concerning the following question:
assume I have a large collection of strings (let's say a million of them) each associated with a timestamp.
I now want to be able to query this collection for all strings matching a given regex, possibly constrained by upper and/or lower limits on the associated timestamp, so e.g. a query would be "find all strings matching /abc.*/", another one would be "find all strings matching /x*y/ where the associated timestamps are of last week".
Evidently I could put all the data into a database and use SQL for the queries but I wonder if there is a good algorithm to build a suitable index for such queries and do all the querying in pure perl - in such a way of course that answering a query should not take more than a few seconds.
If building an index that supports arbirary regexes is too difficult I could make do with shell-style globbing.