I don't have any experience in this field either, but I recommend trying a few approaches to see how they fare. Measurement is key here.
Start with 'one huge file to rule them all'. First benchmark how long it takes to while (<>) { } the whole file, then see how much running a regex on each line slows that down.
Most SQL databases are pretty good at efficiently storing gobs of data, even if you're only accessing it sequentially. Try something similar to the above approach but just use a SQL table to back it.
Finally, are you sure it's the open/close overhead that would kill the naive approach? I'm with you on this, but the point is that neither of us can tell without measuring. You should have a pretty good baseline of how long a while (<>) { } takes on the raw data from the first approach, so compare to that.