Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Re: Re: Bloom::Filter Usage

by pelagic (Priest)
on Apr 20, 2004 at 14:08 UTC ( [id://346637]=note: print w/replies, xml ) Need Help??


in reply to Re: Re: Bloom::Filter Usage
in thread Bloom::Filter Usage

That sounds interesting!

I see following possibilities:
• If your DB happend to be Oracle you could load your 30 Million Recs with a pretty fast tool (sqlloader) and let the tool write the duplicate key records to a specified "discarded-records-file". You could in a second step walk through your exeptions only and eventually update your DB.
• Sort your file before you read sequentially through it. Sorting a biggie will take its time but afterwards you got all entries of one account grouped together. This would reduce your memory consumption.

pelagic

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://346637]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-04-24 22:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found