Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

(Guildenstern) Re: Re: Taming a memory hog

by Guildenstern (Deacon)
on Nov 12, 2003 at 16:44 UTC ( [id://306554]=note: print w/replies, xml ) Need Help??


in reply to Re: Taming a memory hog
in thread Taming a memory hog

The data validation is not performed in the generation application. We've written a few companion scripts that read in the generated data, perform some integrity checks, and compare data with all other data read in.

Each line of data is three records, so the script splits the data and performs a SHA-1 calculation on each record. Each SHA value is saved to a file, then File::Sort is used to sort the SHA file. Then, it's a simple matter of reading each line of the sorted file and comparing it against the previously read line to see if there's a duplicate record.

I chose to compute the SHA for each record because the SHA value is significantly smaller than the record, and SHA values are guaranteed to be unique unless the records are indentical.


Guildenstern
Negaterd character class uber alles!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://306554]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-03-28 13:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found