The data validation is not performed in the generation application. We've written a few companion scripts that read in the generated data, perform some integrity checks, and compare data with all other data read in.
Each line of data is three records, so the script splits the data and performs a SHA-1 calculation on each record. Each SHA value is saved to a file, then File::Sort is used to sort the SHA file. Then, it's a simple matter of reading each line of the sorted file and comparing it against the previously read line to see if there's a duplicate record.
I chose to compute the SHA for each record because the SHA value is significantly smaller than the record, and SHA values are guaranteed to be unique unless the records are indentical.