Bloom::Filter Usage

jreades has asked for the wisdom of the Perl Monks concerning the following question:

I have a large file (30 million records) containing a two-field key that is supposed to be unique. Unfortunately, it isn't necessarily that way... let me try to explain:

The file comes to me with these fields (amongst others):
- Account Number
- Account Open Date
- Account Close Date
Account numbers can be reused, but only if the account has been closed. Unless you have data issues. Which I do. :(
So I need a quick way to check while processing (amongst up to 30 million unique keys) whether I have seen this key before (i.e. whether it's a dupe that needs checking).
The normal hash lookup method $cache{$account} works, but with ever-decreasing performance and an ever-increasing memory profile (it got up to 1.4GB of memory usage).
So a Perl monger suggested looking at Bloom filters. More on Bloom filters is available here: Perl.com

I've found the Bloom::Filter module in CPAN but can't get it to work and am also worried about what level of false-positives I'm facing.

My current code is:

my $bloom_filter = Bloom::Filter->new(error_rate => 0.001, capacity =>
+ 30000000);

if ($bloom_filter->check($account_number)) {
     ... do deduping ...
} else {
     $bloom_filter->add($account_number);
     ... do something ...
}
[download]

I'm looking for wisdom on two fronts:

My current code just warns of a lack of salts for the filter, but I can't determine for the life of me what this module is looking for in terms of a salt as the documentation is... minimal... and everything I've tried adding causes my script to malfunction in spectacular ways
The level of false positive matches is inverseley proportional to the memory footprint of the filter but I am having trouble with the calculations that would enable me to locate the right tradeoff.

Thanks

Back to Seekers of Perl Wisdom