If you are the anonymonk who posted the reply above about Searching XML files, be aware that it would be prudent to use a proper XML parsing module if you are going to be searching for stuff in xml files.
If you are really familiar with and confident about how your xml files are created, and if the xml markup is simple, then sure, you can tailor a regex solution for your data, and it might be more effective/efficient than using a parsing module. But using a parser is not so very complicated (and not so very slow, either).
Here's a demonstration that ought to do what you want in terms of searching for content in xml files; it includes the good suggestions from the previous replies, and adds a few other tweaks as well. Note that we'll filter out all the irrelevant file names during the readdir phase:
#!/usr/bin/perl
use strict;
use XML::Parser;
my ( $path, $pattern ) = @ARGV;
die "Usage: $0 path pattern\n lists files in path that contain patter
+n\n"
unless ( length($path) and -d $path and $pattern =~ /\S/ );
my $found_files = process_files( $path, $pattern );
print "the following files in $path contain '$pattern'\n",
join( "\n", @$found_files ), "\n";
sub process_files
{
my ( $path, $pattern ) = @_;
my @found = ();
my $ignore = qr/\.(?:zip|lfa|txt) | UASTG |
defines | sccpch | sms81154 | sms97767
/x;
opendir( D, $path ) or die "opendir failed on $path: $!";
for my $file ( grep { -f "$path/$_" and !/$ignore/} readdir D ) {
my $nfound = read_file( $path, $file, $pattern );
push @found, "$path/$file: $nfound" if ( $nfound );
}
closedir D;
return \@found;
}
sub read_file
{
my ( $path, $file, $pattern ) = @_;
my $nfnd = 0;
if ( open my $fh, "$path/$file" ) {
my $xml = new XML::Parser( Handlers =>
{ Char => sub { $nfnd++ if $_[1] =~
+ /$pattern/ }
} );
$xml->parse( $fh );
}
else {
warn "open failed on $path/$file: $!\n";
}
return $nfnd;
}
Lots of monks like to recommend other XML modules that are more elaborate or "sophisticated" than the basic XML::Parser, but for your particular case (if I understand it right), this one is a pretty good match. |