Grepping arrays, any better way to do this?

chanakya has asked for the wisdom of the Perl Monks concerning the following question:

Dear Esteemed monks,

I'm working on a task which is as below:

* Generate a list of dates from a given start date to a given end date
* Get a list of files for each of the item(these are directories) in the csv file
* Once I have a the two arrays, i.e one array consisting of a list of dates
and the second array with the list of files for a directory,I have to check for the existence of each of the dates from the first array in the second array.

* The following are the structures of the dates array and files array contains the filenames as
@datelist = ("20030901", "20061017", "20050406", "20070101", "20080202"); @fileslist = ("DIR22.20060816", "DIR22.20050919", "DIR22.20061017", "DIR22.20060516", "DIR22.20050406");

I am using grep to check the existence of dates from the first array in the second array.
Below is the code I'm using, please let me know whether this approach is correct or is there any other better way:

#!/usr/bi/perl
$driver = "./dirlist.csv";
open DRV , $driver || die "Cannot open $driver: $!";

while( <DRV> ){
    chomp;
    my ($dirname) = shift;
    checkExistingDates($dirname);
}
close DRV;

sub checkExistingDates{
    my $dirname = shift;
    my @datelist = ("20030901", "20061017", "20050406", "20070101", "2
+0080202");
    
    #the file list should be read from the $dirname, for testing using
+ hard values
    my @fileslist = ("DIR22.20060816", "DIR22.20050919", "DIR22.200610
+17", "DIR22.20060516", "DIR22.20050406");
    
    my @matched;
    foreach my $date (@datelist}){
               @matched = grep{$date} @fileslist;
               }
    print Dumper @matched;
    
    
}
[download]

Thanks in advance

Comment on Grepping arrays, any better way to do this? Download Code

Replies are listed 'Best First'.
Re: Grepping arrays, any better way to do this? by bart (Canon) on Feb 15, 2007 at 11:15 UTC
I have to check for the existence of each of the dates from the first array in the second array Whenever I see a phrase like this, a red light goes on in my head, with the caption "use a hash!" Your code isn't actually working, that grep looks very fishy, for one, so I can't produce equivalent working code using a hash, but the idea would be that you build a hash traversing one array, and that you use the existence of an item in this hash as a flag, when traversing the second array. Lookup in a hash is much faster than grepping through an array, and the larger the list, the higher the gain. You do need to make sure you do an exact lookup, not an approaximate one, for this to work.	[reply]
Re: Grepping arrays, any better way to do this? by izut (Chaplain) on Feb 15, 2007 at 11:17 UTC
Assuming that the first part of each `@fileslist` element will have that scheme: use strict; use warnings; use Smart::Comments; my @datelist = ( "20030901", "20061017", "20050406", "20070101", "2008 +0202" ); my @fileslist = ( "DIR22.20060816", "DIR22.20050919", "DIR22.20061017", "DIR22.20060 +516", "DIR22.20050406" ); # creates a hash with all dates seen on @fileslist my %seen_dates; @seen_dates{ map { ( unpack( "x6A", $_ ) )[0] } @fileslist } = (); # increases the value of each seen @datelist element on %seen_dates. foreach (@datelist) { if ( exists $seen_dates{$_} ) { $seen_dates{$_}++; } } # search for defined values for each %seen_dates key. my @seen_dates = grep { defined $seen_dates{$_} } keys %seen_dates; ### @seen_dates [download] Igor 'izut' Sutton your code, your rules.*	[reply] [d/l] [select]
Re: Grepping arrays, any better way to do this? by akho (Hermit) on Feb 15, 2007 at 11:44 UTC
You should push the grepped list onto @matched, not assign. The way it is written you only match the last date. Besides, the grep is messed up. You should also use strict and warnings; Data::Dumper is not used in your script, so nothing will be printed (and you don't get a warning). The corrected script: #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $driver = "<dirlist.csv"; open DRV , $driver or die "Cannot open $driver: $!"; while( <DRV> ){ chomp; my ($dirname) = shift; checkExistingDates($dirname); } close DRV; sub checkExistingDates{ my $dirname = shift; my @datelist = ("20030901", "20061017", "20050406", "20070101", "2 +0080202"); my @fileslist = ("DIR22.20060816", "DIR22.20050919", "DIR22.200610 +17", "DIR22.20060516", "DIR22.20050406"); my @matched = (); foreach my $date (@datelist) { push (@matched, grep {/$date/} @fileslist); } print Dumper @matched; } [download] But the hash solution mentioned above is much better. Upd: btw, $dirname is read from the command line, not from the file. I suppose you meant something along the lines of `my $dirname = $_` or `my $dirname = (split /,/)[0]`.	[reply] [d/l] [select]
Re: Grepping arrays, any better way to do this? by chanakya (Friar) on Feb 15, 2007 at 14:05 UTC
bart,izut,akho thank you for your wonderful comments and code. I'd like to know more on whats happening in izut's code, most specifically the map and unpack part. Thank you once again	[reply]
Re^2: Grepping arrays, any better way to do this? by 5mi11er (Deacon) on Feb 16, 2007 at 23:00 UTC
`# creates a hash with all dates seen on @fileslist my %seen_dates; @seen_dates{ map { ( unpack( "x6A*", $_ ) )[0] } @fileslist } = (); # increases the value of each seen @datelist element on %seen_dates. foreach (@datelist) { if ( exists $seen_dates{$_} ) { $seen_dates{$_}++; } }` [download] This code is commented already with what it is doing. The map/unpack stuff is "magically" iterating over the array @fileslist, and unpacking each entry therein. map is the iterator, it takes expressions or a block of code as the thing to do with each element in a given list. Unpack is ignoring 6 bytes of data, and then grabbing the rest as ascii text. Since unpack can return a list itself, the author was careful to specifically ask for exactly the first element of the one element list. (that's the ')[0]' part right after the unpack), then for each thing in this new list, create an empty hash entry. The upshot of all that is that it creates a hash entry for each 'date' portion of the files in fileslist. Then once the hashes are created, it goes through the date list, and for each date within this hash, if it is in the date list, the hash entry is incremented. (that's the foreach section of code). Actually that code is not written to work correctly, as the first part is creating empty lists/arrays of hashes not simple hashes anyway... It is likely that the seemingly tortuous example given was to try to teach you something by getting you to figure this stuff out yourself, and keep you from actually using that in a homework assignment You'll notice I've not given you simpler to understand code for this same reasons. -Scott	[reply] [d/l]


Perl-Sensitive Sunglasses
	PerlMonks