In fact, the process of developing each of the test subroutines was based on the results of the comparision using a subset of the data. What I did, in that case, was continuosly creating new tests and outputting to a csv file A, B and the comparision score. I stopped when I got a good result of both a limit score and having few false positives and false negatives.
I think you could do it in the same way, no need for anything much sofisticated, just a subset of the database and many runs improving the type of tests you make.