Very similar to code I wrote for a like project. I found
that if there was a huge amount of data that it would bog
the machine down. The way I found around this was to work
with sorted data (using the Unix sort command) and then
to buffer the input by reading from the first file for
many lines, then go to the second file and output all the
matching lines and keep reading in the second file until
you fill your buffer, go to the first, etc...
Note, I was working with data I knew and I was able to tune
the buffering for the machine it ran on.