Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Grepping arrays, any better way to do this?

by chanakya (Friar)
on Feb 15, 2007 at 10:49 UTC ( [id://600177]=perlquestion: print w/replies, xml ) Need Help??

chanakya has asked for the wisdom of the Perl Monks concerning the following question:

Dear Esteemed monks,

I'm working on a task which is as below:

* Generate a list of dates from a given start date to a given end date
* Get a list of files for each of the item(these are directories) in the csv file
* Once I have a the two arrays, i.e one array consisting of a list of dates
and the second array with the list of files for a directory,I have to check for the existence of each of the dates from the first array in the second array.

* The following are the structures of the dates array and files array contains the filenames as
@datelist = ("20030901", "20061017", "20050406", "20070101", "20080202"); @fileslist = ("DIR22.20060816", "DIR22.20050919", "DIR22.20061017", "DIR22.20060516", "DIR22.20050406");

I am using grep to check the existence of dates from the first array in the second array.
Below is the code I'm using, please let me know whether this approach is correct or is there any other better way:

#!/usr/bi/perl $driver = "./dirlist.csv"; open DRV , $driver || die "Cannot open $driver: $!"; while( <DRV> ){ chomp; my ($dirname) = shift; checkExistingDates($dirname); } close DRV; sub checkExistingDates{ my $dirname = shift; my @datelist = ("20030901", "20061017", "20050406", "20070101", "2 +0080202"); #the file list should be read from the $dirname, for testing using + hard values my @fileslist = ("DIR22.20060816", "DIR22.20050919", "DIR22.200610 +17", "DIR22.20060516", "DIR22.20050406"); my @matched; foreach my $date (@datelist}){ @matched = grep{$date} @fileslist; } print Dumper @matched; }
Thanks in advance

Replies are listed 'Best First'.
Re: Grepping arrays, any better way to do this?
by bart (Canon) on Feb 15, 2007 at 11:15 UTC
    I have to check for the existence of each of the dates from the first array in the second array
    Whenever I see a phrase like this, a red light goes on in my head, with the caption "use a hash!"

    Your code isn't actually working, that grep looks very fishy, for one, so I can't produce equivalent working code using a hash, but the idea would be that you build a hash traversing one array, and that you use the existence of an item in this hash as a flag, when traversing the second array.

    Lookup in a hash is much faster than grepping through an array, and the larger the list, the higher the gain.

    You do need to make sure you do an exact lookup, not an approaximate one, for this to work.

Re: Grepping arrays, any better way to do this?
by izut (Chaplain) on Feb 15, 2007 at 11:17 UTC

    Assuming that the first part of each @fileslist element will have that scheme:

    use strict; use warnings; use Smart::Comments; my @datelist = ( "20030901", "20061017", "20050406", "20070101", "2008 +0202" ); my @fileslist = ( "DIR22.20060816", "DIR22.20050919", "DIR22.20061017", "DIR22.20060 +516", "DIR22.20050406" ); # creates a hash with all dates seen on @fileslist my %seen_dates; @seen_dates{ map { ( unpack( "x6A*", $_ ) )[0] } @fileslist } = (); # increases the value of each seen @datelist element on %seen_dates. foreach (@datelist) { if ( exists $seen_dates{$_} ) { $seen_dates{$_}++; } } # search for defined values for each %seen_dates key. my @seen_dates = grep { defined $seen_dates{$_} } keys %seen_dates; ### @seen_dates

    Igor 'izut' Sutton
    your code, your rules.

Re: Grepping arrays, any better way to do this?
by akho (Hermit) on Feb 15, 2007 at 11:44 UTC
    You should push the grepped list onto @matched, not assign. The way it is written you only match the last date. Besides, the grep is messed up.

    You should also use strict and warnings; Data::Dumper is not used in your script, so nothing will be printed (and you don't get a warning).

    The corrected script:

    #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $driver = "<dirlist.csv"; open DRV , $driver or die "Cannot open $driver: $!"; while( <DRV> ){ chomp; my ($dirname) = shift; checkExistingDates($dirname); } close DRV; sub checkExistingDates{ my $dirname = shift; my @datelist = ("20030901", "20061017", "20050406", "20070101", "2 +0080202"); my @fileslist = ("DIR22.20060816", "DIR22.20050919", "DIR22.200610 +17", "DIR22.20060516", "DIR22.20050406"); my @matched = (); foreach my $date (@datelist) { push (@matched, grep {/$date/} @fileslist); } print Dumper @matched; }

    But the hash solution mentioned above is much better.

    Upd: btw, $dirname is read from the command line, not from the file. I suppose you meant something along the lines of my $dirname = $_ or my $dirname = (split /,/)[0].

Re: Grepping arrays, any better way to do this?
by chanakya (Friar) on Feb 15, 2007 at 14:05 UTC
    bart,izut,akho thank you for your wonderful comments and code.
    I'd like to know more on whats happening in izut's code, most specifically the map and unpack part.

    Thank you once again
      # creates a hash with all dates seen on @fileslist my %seen_dates; @seen_dates{ map { ( unpack( "x6A*", $_ ) )[0] } @fileslist } = (); # increases the value of each seen @datelist element on %seen_dates. foreach (@datelist) { if ( exists $seen_dates{$_} ) { $seen_dates{$_}++; } }
      This code is commented already with what it is doing. The map/unpack stuff is "magically" iterating over the array @fileslist, and unpacking each entry therein. map is the iterator, it takes expressions or a block of code as the thing to do with each element in a given list. Unpack is ignoring 6 bytes of data, and then grabbing the rest as ascii text. Since unpack can return a list itself, the author was careful to specifically ask for exactly the first element of the one element list. (that's the ')[0]' part right after the unpack), then for each thing in this new list, create an empty hash entry.

      The upshot of all that is that it creates a hash entry for each 'date' portion of the files in fileslist. Then once the hashes are created, it goes through the date list, and for each date within this hash, if it is in the date list, the hash entry is incremented. (that's the foreach section of code).

      Actually that code is not written to work correctly, as the first part is creating empty lists/arrays of hashes not simple hashes anyway...

      It is likely that the seemingly tortuous example given was to

      • try to teach you something by getting you to figure this stuff out yourself, and
      • keep you from actually using that in a homework assignment
      You'll notice I've not given you simpler to understand code for this same reasons.

      -Scott

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://600177]
Approved by prasadbabu
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2024-04-20 04:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found