in reply to how to speed lookups?
I personally believe I have some problems understanding your question: I'm very tired so it may be just me, but I'm trying to recap from your description of the problem, and please tell me if I'm getting anything wrong...
- "@validbufs is a unique list of strings." (Taken verbatim!) Thus not regexen...
- You need to "print out the lines in BUFFER.dat that have the first column (in that line) exactly matching any element in the array @validbufs." (Also verbatim but for a typo.) Thus what does it mean to "exactly match" in this context?
I think I eventually got it: you want to print those lines in your input file i.e. BUFFER.dat such that their first field is an element of @validbufs. Is this a correct rephrasing of your problem? If so, then the general rule is to always make that @validbufs into a hash, say %isvalid. (Although in 5.10 times you can take advantage of the smartmatch operator ~~, but I'm sure that for a problem of this size a hash is still better.) This can be just as simple as:
my %isvalid; # outside of the loop over lines, of course! @isvalid{@validbufs}=(1) x @validbufs;
Then you'd simply
print $line if $isvalid{$field};
with $field got as per ikegami's suggestion or perhaps not even created as an intermediate variable.
Now, it is to be said that since as you claim there is a 1-1 correspondence between lines that match in the file and entries of @validbufs, one may be tempted to delete from %isvalid any key that matches, to shrink it, because once it has matched it won't match any more. But given the amount of work this would inflict on %isvalid and hashes' efficiency at lookups, I'm sure that both a big-Oh analysis and experimental evidence would be that it would take more work overall than not doing it...
|
---|