While I don't completely understand the work you're trying to do, I suggest checking out GNU/Parallel. It's usually a good fit for this type of stuff. It supports the unix pipeline philosophy quite well and lets you the programmer worry about the algorithm and keep it separate from the burden of scheduling tasks, distributing work, etc.
So, for example, create a simple program that can do the lookups on a line by line basis then call parallel with the --pipe option and it will chunk up your input file and call your program on all available cores.