Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^3: I sense there is a simpler way...

by jdporter (Paladin)
on Aug 22, 2004 at 20:30 UTC ( [id://384966]=note: print w/replies, xml ) Need Help??


in reply to Re^2: I sense there is a simpler way...
in thread I sense there is a simpler way...

Is gobbling an entire file into an array considered bad form? . . .

One should always be aware of the efficiency concern. If you're sure the file will never be "too big", sluurping (as it's called) shouldn't be a problem. Otherwise, you'd do well to try to do per-record reading/processing, where practical.

Calin's solution is good. If you want a little extra efficiency, you can buy it with memory, i.e. data structures. In the solution below, we maintain a separate hash for those keys which are known to be duplicates. Then, at the end, we iterate only over that hash. This has a pay-off if the number of duplicate keys is significantly smaller than the total number of keys.

my( %keys, %dup ); while (<STDIN>) { chomp; if ( /PROBABLECAUSE\w*\((\d+),\s*\w*,\s+(\w*)/ ) { my( $id, $key ) = ( $1, $2 ); if ( exists $dup{$key} ) # already found to be a dup { push @{ $dup{$key} }, $id; } elsif ( exists $keys{$key} ) # only seen once before { push @{ $dup{$key} }, delete($keys{$key}), $id; } else # first time seen { $keys{$key} = $id; } # check if any key has init caps (not allowed) if ( $key =~ /^[A-Z]\w*/ ) { print "Id: $id - $key\n"; } } } print "\nDuplicated keys:\n\n"; for my $key ( keys %dup ) { print "Key: $key\n"; print "\tId: $_\n" for @{$dup{$key}}; }
(Not tested)

Replies are listed 'Best First'.
Re^4: I sense there is a simpler way...
by HelgeG (Scribe) on Aug 23, 2004 at 09:43 UTC
    jdporter, thanks. I knew there was a reason for entering the monastery,and the replies I have received to my query have been interesting and educating.

    I like this last solution where instead of going through the entire data on the second pass, we only look at known duplicates.

    Having worked with perl for the last weeks has been somewhat of a revelation to me. it is amazing how much real work can be accomplished with a few lines of carefully chosen code.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://384966]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-04-26 07:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found