Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Best way to match items in an array

by ibanix (Hermit)
on Dec 07, 2002 at 03:46 UTC ( [id://218209]=perlquestion: print w/replies, xml ) Need Help??

ibanix has asked for the wisdom of the Perl Monks concerning the following question:

I have a list of file names in a variable-length @dir_list array. I want to match each item in the array that has some suffix eg .zip, .log, whatever; then I'll have to do some operations on them. What is the most efficent way to do this, since I will be doing a great many of these?

I could do this, but is there a better way?
my $suffix = ".zip"; # value is likely to change foreach my $file (@dir_list) { if ($file =~ /$suffix$/e) { print "Match on $file!\n"; do_this(); do_that(); } }
Thanks,
ibanix

$ echo '$0 & $0 &' > foo; chmod a+x foo; foo;

Replies are listed 'Best First'.
Re: Best way to match items in an array
by pg (Canon) on Dec 07, 2002 at 04:07 UTC
    Some suggestions:
    • use grep
    • use a hash for suffixes and their handlers
    use strict; use warnings; sub zip { my $file = shift; print "got a zip file $file\n"; } sub txt { my $file = shift; print "got a txt file $file\n"; } my @files = ("a.zip", "b.log", "c.zaaap", "d.txt"); #before you have a particular handler developed, #you can just set it to undef my %handlers = ("zip" => \&zip, "zaaap" => undef, "txt" => \&txt); my $pattern = join("|", map {"\\\.$_\$"} keys %handlers); print "pattern = $pattern\n"; foreach (grep {/$pattern$/} @files) { m/\.([^\.]*)$/; if (defined($handlers{$1})) { &{$handlers{$1}}($_); } else { print "No handler defined for $_\n"; } }
      That certainly is a unique way to do this, that I had not considered. However, what I expect that I will have many many different suffixes and it would be difficult to create a handle for each one.

      $ echo '$0 & $0 &' > foo; chmod a+x foo; foo;
        It is not required to have unique handler for each suffix. As the suffixes are just 'values' of the hash, not 'keys', so you can share handlers among suffixes. Also, you can make those handlers accept more parameters, for complex situation.
Re: Best way to match items in an array
by Zaxo (Archbishop) on Dec 07, 2002 at 04:26 UTC

    This problem is made for glob.

    my $pattern = '*.zip'; for my $file ( glob $pattern ) { next unless -f $file; print 'Doing this and that for ', $file, $/; do_this($file); do_that($file); }
    Note the check that the name represents a regular file.

    After Compline,
    Zaxo

      Files may not actually exist. That is, I may be working with files that do not at the given time the script is running, exist on the filesystem; so I do not think glob will help me here.

      $ echo '$0 & $0 &' > foo; chmod a+x foo; foo;
Re: Best way to match items in an array
by stefp (Vicar) on Dec 07, 2002 at 06:29 UTC
    The first pass gathers the filenames in %file. The second pass handles the files suffix per suffix if an handler of the form handle_XXX is defined for the suffix XXX. Note that you can add new handlers without touching the logic.
    #! /usr/bin/perl use strict; my %file; # hash of filenames arrays keyed by suffix # first pass for (<*>) { push @{$file{$2}}, $_ if /(.*\.(.*))/; } # second pass for my $suffix (sort keys %file) { no strict 'refs'; next unless defined &{"handle_$suffix"}; &{"handle_$suffix"}($_) for sort @{$file{$suffix}} } # process the file with suffix '.cpp' sub handle_cpp { print "$_[0]\n"; }

    -- stefp

Re: Best way to match items in an array
by Cmdr_Tofu (Scribe) on Dec 07, 2002 at 07:47 UTC
    here's another one:
    @file_list = qw(file1.log file2.txt file3.log file4.txt ignorefiletype +.bin file5.conf); @file_list_for_processing = grep /log|txt|conf/,@file_list; foreach $file (@file_list_for_processing) { print "$file\n"; }
    Rohit
      This is more what I was looking for. I should have generalized the question to: Given any number of elements in an array, what is the fastest way to find those elements that match any given number of regular expressions?

      $ echo '$0 & $0 &' > foo; chmod a+x foo; foo;
Re: Best way to match items in an array
by bronto (Priest) on Dec 07, 2002 at 13:45 UTC

    I hope I understood your question

    # This is untested code! my @dlcopy = @dir_list ; my @extensions = qw(zip log whatever) ; my %matches ; foreach my $ext (@extensions) { my @nonmathing ; my @matching ; foreach my $file (@dlcopy) { if ($file =~ /\.$ext$/) { push @matching,$file ; } else { push @nonmatching,$file ; } } @dlcopy = @nonmatching ; $matches{$ext} = \@matching ; }

    That is: you work on a copy of @dir_list, iterating on it; you save in @nonmatching just the files that didn't match the pattern, and in @matching those that match. At the end of every cycle you shorten the inner foreach, since you are working only on those files that didn't match before. At the end of the game %matches contains all @dir_list elements that matched, and @dl_copy all those that didn't

    Of course, we can heavily optimize this; the double foreach could be optimized, for example using ?:, array references and only one push instead of the if block. And, of course, the pattern matching could be "half-hardcoded" with an eval at each cycle...

    But, at least, this should be clear code, even if untested :-)

    Ciao!
    --bronto

    # Another Perl edition of a song:
    # The End, by The Beatles
    END {
      $you->take($love) eq $you->make($love) ;
    }

      An optimization:

      my @dlcopy = @dir_list ; my @extensions = qw(zip log whatever) ; my %matches ; foreach my $ext (@extensions) { my $nonmatching = [] ; my $matching = [] ; my $pattern = qr(\.$ext$) ; foreach my $file (@dlcopy) { push @{$file =~ /$pattern/ ? $matching : $nonmatching},$file ; } @dlcopy = @$nonmatching ; $matches{$ext} = $matching ; }

      Ciao!
      --bronto

      # Another Perl edition of a song:
      # The End, by The Beatles
      END {
        $you->take($love) eq $you->make($love) ;
      }

      Another optimization, using eval

      my @dlcopy = @dir_list ; my @extensions = qw(zip log whatever) ; my %matches ; foreach my $ext (@extensions) { my $nonmatching = [] ; my $matching = [] ; my $code = q| foreach my $file (@dlcopy) { push @{$file =~ /\\.|.$ext.q|$/ ? $matching : $nonmatching},$file +; } | ; # warn "DEBUG:$code" ; eval $code ; if ($@) { die "Something went wrong: $@" ; } else { @dlcopy = @$nonmatching ; $matches{$ext} = $matching ; } }

      This one further optimizes the pattern matching, dynamically creating the inner foreach with the pattern hardcoded (and not reevaluated at each cycle). This could be a great enhancement if you are matching a number of files that is far greater than the number of extensions (since you would have a few evals and a lot of "constant" pattern matchings)

      Ciao!
      --bronto

      # Another Perl edition of a song:
      # The End, by The Beatles
      END {
        $you->take($love) eq $you->make($love) ;
      }

Re: Best way to match items in an array
by ibanix (Hermit) on Dec 07, 2002 at 03:55 UTC
    Update: Whoops, guess I can't do file =~ /$suffix$/e either...

    $ echo '$0 & $0 &' > foo; chmod a+x foo; foo;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://218209]
Approved by pg
Front-paged by pg
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (2)
As of 2024-04-25 21:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found