Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Hash checking

by njcodewarrior (Pilgrim)
on Apr 25, 2007 at 20:58 UTC ( [id://612094]=note: print w/replies, xml ) Need Help??


in reply to Hash checking

No need to use a hash if you're just reading numbers from a file. Use an array instead:

#!/usr/bin/perl use strict; use warnings; my $file = 'AXP_FACS.DAT'; open my $FH, '<', $file or die "Error opening file: $!"; my @faclist = (<$FH>); % Read all lines into the array close $FH; chomp @faclist; % Remove newlines from each entry. my @numbers = 1..5000; foreach my $integer ( @numbers ) { unless ( grep { /\b$integer\b/ } @faclist ) { print "Not found: $integer\n"; } }

The '\b' at the start and end of the grep regular expression matches the entire number, not just a single digit.

njcodewarrior

Replies are listed 'Best First'.
Re^2: Hash checking
by GrandFather (Saint) on Apr 25, 2007 at 22:12 UTC
    No need to use a hash ...

    unless you are concerned with execution time. grep performs a linear search through the entire array each time through the loop so the search is O(n2). A hash performs an essentially constant time lookup so the search is O(n).


    DWIM is Perl's answer to Gödel

      Thanks for the reply GrandFather. Not that I didn't believe you, but here's the proof:

      #! /usr/bin/perl
      
      use strict;
      use warnings;
      
      use File::Spec;
      use Data::Dumper;
      use Benchmark qw( timethese cmpthese );
      
      
      my ( undef, undef, $app ) = File::Spec->splitpath( $0 );
      
      open my $DATA, '<', './AXP_FACS.DAT' or die "Error opening file: $!";
      my @faclist = (<$DATA>);
      chomp @faclist;
      close $DATA;
      
      sub grep_by_array {
          my ( $ref ) = @_;
          my @faclist = @$ref;
          my @found;
          foreach my $integer ( 1..5000 ) {
              if ( grep { /\b$integer\b/ } @faclist ) {
                  unshift @found, $integer;
              }
          }
      
          return \@found;
      
      }
      
      # Convert the array to a hash with the numbers as keys
      my %list = map { $_ => 1 } @faclist;
      
      sub grep_by_hash {
          my ( $ref ) = @_;
          my %faclist = %$ref;
          my @found;
          foreach my $integer( 1..5000 ) {
              if ( exists $faclist{$integer} ) {
                  unshift @found, $integer;
              }
          }
      
          return \@found;
      
      }
      
      # Benchmark the 2 subs
      my $r = timethese( 1000, {
              'array' => sub{ grep_by_array(\@faclist) },
              'hash'  => sub{ grep_by_hash(\%list) },
          }
      );
      
      cmpthese( $r );
      

      RESULTS:

      Benchmark: timing 5000 iterations of array, hash...
           array: 339 wallclock secs (339.02 usr +  0.04 sys = 339.06 CPU) @ 14.75/s (n=5000)
            hash:  8 wallclock secs ( 8.27 usr +  0.00 sys =  8.27 CPU) @ 604.59/s (n=5000)
              Rate array  hash
      array 14.7/s    --  -98%
      hash   605/s 4000%    --
      

      That's quite an improvement using a hash!
      You learn something every day...

      njcodewarrior

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://612094]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-04-19 11:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found