johnnywang has asked for the wisdom of the Perl Monks concerning the following question:
Hi, I'm looking for wisdom on the following searching/matching problem: I have two files, each containing some records (say one string per line). I'd like to go through the second file to find out whether each record is in the first file. One easy way to do it is to load the first as a hash, then just iterate through the second. My problem is the files are huge, putting the whole thing in memory is pushing the limits. Is there a less resource-demanding approach? of course speed is also a great concern. (The files are now containing about 10 million records, but growing. If the whole thing can be done within a few hours to a day would be fine.) Thanks.
|
---|
Replies are listed 'Best First'. | |
---|---|
•Re: Efficient search through a huge dataset
by merlyn (Sage) on Oct 19, 2004 at 23:58 UTC | |
by johnnywang (Priest) on Oct 20, 2004 at 01:03 UTC | |
by pg (Canon) on Oct 20, 2004 at 02:11 UTC | |
by dragonchild (Archbishop) on Oct 20, 2004 at 02:54 UTC | |
by Corion (Patriarch) on Oct 20, 2004 at 11:39 UTC | |
| |
by pg (Canon) on Oct 20, 2004 at 04:44 UTC | |
by Caron (Friar) on Oct 20, 2004 at 08:12 UTC | |
| |
Re: Efficient search through a huge dataset
by tmoertel (Chaplain) on Oct 20, 2004 at 02:33 UTC | |
Re: Efficient search through a huge dataset
by lhoward (Vicar) on Oct 20, 2004 at 00:16 UTC | |
Re: Efficient search through a huge dataset
by fergal (Chaplain) on Oct 20, 2004 at 09:31 UTC | |
Re: Efficient search through a huge dataset
by Anonymous Monk on Oct 20, 2004 at 13:50 UTC | |
Re: Efficient search through a huge dataset
by artist (Parson) on Oct 19, 2004 at 23:57 UTC | |
by johnnywang (Priest) on Oct 20, 2004 at 01:00 UTC | |
Re: Efficient search through a huge dataset
by TedPride (Priest) on Oct 20, 2004 at 06:18 UTC |
Back to
Seekers of Perl Wisdom