It looks like you're already getting some help on solving the problem you posted, so I won't elaborate on that.
However, your program looks like it could be fun to play around in and try to optimize a bit. To that end, I'd like to run the program, but I don't know enough about your field or the terminology to be able to figure out how to come up with a configuration file that will actually run and do something. Can you post a few simple config files that set up some simple runs using the test dataset you provided? If you can do that, I may be able to do some tweaking on your program to improve things a bit, and send a few pull requests your way.
When your only tool is a hammer, all problems look like your thumb.