Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Help with a faster loop

by gzayzay (Sexton)
on Mar 01, 2006 at 14:33 UTC ( [id://533654]=perlquestion: print w/replies, xml ) Need Help??

gzayzay has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I am writing a search engine to sort through a lot of data. I have developed a loop that is working fine but it is very slow. Thus, I will like any suggestion on a faster loop. Below it my code and I will like to make faster and smart.

sub process_files { if (opendir(TEST, $dir_path)) { @files = sort grep{$_ ne '.' and $_ ne '..'} readdir(TEST); #print "\n@files[0]\n"; for ($j = 0; $j <= $#files; $j++) { $file_path = $dir_path."\\".@files[$j]; read_files(); } } else { die ("Could not Opendir $!\n"); }closedir TEST; } sub read_files { if ($file_path =~ /defines|sccpch|sms81154|sms97767/) { next; } elsif(!($file_path =~ /defines|sccpch|sms81154|sms97767/)) { if (open(FILES, $file_path)) { if (!($file_path =~ /\.lfa|\.zip|\.txt|UASTG/)) { print (DATA "$file_path\n"); } } else { die ("Could not open[$file_path], $!\n"); } }close FILES; }

Thanks,

Edman

Replies are listed 'Best First'.
Re: Help with a faster loop
by JediWizard (Deacon) on Mar 01, 2006 at 14:44 UTC

    if ($file_path =~ /defines|sccpch|sms81154|sms97767/) { next; } elsif(!($file_path =~ /defines|sccpch|sms81154|sms97767/)) # This + is redunant { if (open(FILES, $file_path)) # use +the -f operator to test # file +s existance, or -W to test # if i +t can be written to { if (!($file_path =~ /\.lfa|\.zip|\.txt|UASTG/)) { print (DATA "$file_path\n"); } } else { die ("Could not open[$file_path], $!\n"); } ## The above is redunant... it would be better stated as: if ($file_path =~ /defines|sccpch|sms81154|sms97767/) { next; } else { if (-f $file_path) { if (!($file_path =~ /\.lfa|\.zip|\.txt|UASTG/)) { print (DATA "$file_path\n"); } } else { die ("Could not open[$file_path], $!\n"); } }

    They say that time changes things, but you actually have to change them yourself.

    —Andy Warhol

      I think the code below looks cleaner. Depending on what your code does, you can use the memoize module to keep a cache of files you have already checked. Passing $dir_path and $filepath to the subroutine is definitely a good idea. My point here is that it is best to keep subroutines self-contained and not use external variables:
      use strict; use Memoize; memoize ('read_files'); sub process_files { my $dir_path = shift; if (opendir(TEST, $dir_path)) { my @files = sort grep{$_ ne '.' and $_ ne '..'} readdir(TEST); #print "\n@files[0]\n"; read_files("$dir_path\\$_") foreach (@files); } else { die ("Could not Opendir $dir_path: $!\n"); }closedir TEST; } sub read_files { my $file_path = shift; if (-f $file_path) { print (DATA "$file_path\n") if ($file_path !~ /(\.lfa|\.zip|\. +txt|UASTG)$/); } else { die ("Could not open[$file_path], $!\n"); } }
      The only global here is the DATA filehandle, but you should pass that along through the subroutine chain as a parameter.
      -imran
        Thanks a lot for your assistance.

        I will be opening various xml files in som of my files, without using xml parser module, is it possible to get information from tags.

        Example: if a xml file has a tag <edman> Hello World! </edman> within it, and I want to look for "Hello" within any files, how do i read b/w tags.

      Thanks Andy, I used your modification and it is working. Edman
Re: Help with a faster loop
by graff (Chancellor) on Mar 02, 2006 at 05:15 UTC
    If you are the anonymonk who posted the reply above about Searching XML files, be aware that it would be prudent to use a proper XML parsing module if you are going to be searching for stuff in xml files.

    If you are really familiar with and confident about how your xml files are created, and if the xml markup is simple, then sure, you can tailor a regex solution for your data, and it might be more effective/efficient than using a parsing module. But using a parser is not so very complicated (and not so very slow, either).

    Here's a demonstration that ought to do what you want in terms of searching for content in xml files; it includes the good suggestions from the previous replies, and adds a few other tweaks as well. Note that we'll filter out all the irrelevant file names during the readdir phase:

    #!/usr/bin/perl use strict; use XML::Parser; my ( $path, $pattern ) = @ARGV; die "Usage: $0 path pattern\n lists files in path that contain patter +n\n" unless ( length($path) and -d $path and $pattern =~ /\S/ ); my $found_files = process_files( $path, $pattern ); print "the following files in $path contain '$pattern'\n", join( "\n", @$found_files ), "\n"; sub process_files { my ( $path, $pattern ) = @_; my @found = (); my $ignore = qr/\.(?:zip|lfa|txt) | UASTG | defines | sccpch | sms81154 | sms97767 /x; opendir( D, $path ) or die "opendir failed on $path: $!"; for my $file ( grep { -f "$path/$_" and !/$ignore/} readdir D ) { my $nfound = read_file( $path, $file, $pattern ); push @found, "$path/$file: $nfound" if ( $nfound ); } closedir D; return \@found; } sub read_file { my ( $path, $file, $pattern ) = @_; my $nfnd = 0; if ( open my $fh, "$path/$file" ) { my $xml = new XML::Parser( Handlers => { Char => sub { $nfnd++ if $_[1] =~ + /$pattern/ } } ); $xml->parse( $fh ); } else { warn "open failed on $path/$file: $!\n"; } return $nfnd; }
    Lots of monks like to recommend other XML modules that are more elaborate or "sophisticated" than the basic XML::Parser, but for your particular case (if I understand it right), this one is a pretty good match.
      Thanks for this code. It looks very cool. my concern about using the xml parser is that if i want to share my code with another person, I think they will have to have xml parser module installed before they can run the code. If the XML::Parser is a core module, than that will take care of that problem. however, I don't think it is a core module. Correct me if I am wrong.

      Again, I really like your code, I will be using it for internal purpose since i have xml parser installed on my machine.

      Edman

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://533654]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (2)
As of 2024-04-25 21:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found