Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Tricky regexp

by riceboyyy (Novice)
on Jun 26, 2011 at 02:40 UTC ( [id://911425]=perlquestion: print w/replies, xml ) Need Help??

riceboyyy has asked for the wisdom of the Perl Monks concerning the following question:

Here is a program I made to search for a phrase in a directory. I'm a bit new to perl, so point out any other corrections that can be made:
#!/usr/bin/perl use warnings; use strict; #regexp.pl print "Gimme the address of the directory:\n"; my $folder = <>; chomp $folder; opendir(DIR, $folder) or die "We got a problem: $!"; my @files = readdir(DIR); print "What's the phrase you're looking for?\n"; my $find = <>; chomp $find; my @found; foreach (@files){ open(FILE, $_); my @FILE = <FILE>; if (@FILE =~ /$find/i){ push (@found, $_); } else { next; } } print "Your query was found in the following files:\n"; print "@found\n";
So I tried running this, and it complains: "readline() closed on filehandle FILE at regexp.pl line 21" Line 21, by the way, is the line containing:
my @FILE = <FILE>;

Replies are listed 'Best First'.
Re: Tricky regexp
by graff (Chancellor) on Jun 26, 2011 at 03:23 UTC
    First, if I understand your stated task, the common unix/gnu "grep" command already does what you want:
    cd /path/to/search grep -l pattern_to_find *
    The two shell commands above do exactly what you were trying to do in perl (and if you're on a windows system, the gnu "bash" shell and "grep" are available for your OS -- since you have perl, you should know about these other tools).

    There are a few problems with the OP script:

    • You opendir() a path provided by the user and get file names, which is fine, but then you either need to chdir to that path, or else prepend the path string to each file name in order to open the file successfully. The OP script doesn't do either of those things
    • You read the full content of every file into memory, but you don't need to do that, just read a line at a time until you either (a) reach EOF, or (b) find the first occurrence of the target pattern. In the latter case, you can print the file name, stop reading that file, and move on to the next. Big files will only increase the memory footprint if they happen to be binaries without any embedded line-breaks.
    • You do a regex match on an array, but the =~ operator is supposed to be used on a single scalar value (one string) -- i.e. on each element of the array. That's another good reason just to read one line at a time, to check each line against the regex, and not use an array for file data.
    • (added as an update:) You require the user to type input to the script after it starts running, rather than getting all the required user input from command-line args (using @ARGV) -- that gets really tiresome.
    I'm not sure if you've given us the exact wording of the error message you got, and I'm not sure why you got a message like "readline() on closed filehandle" -- but that's the least of your problems. If you really don't want to use the existing "grep" command (e.g. if you want to use a regex that only Perl will support), then try something like this:
    #!/usr/bin/perl use strict; use warnings; my $Usage = "Usage: $0 [-p path/to/search] regex\n"; if ( @ARGV > 2 and $ARGV[0] eq '-p' ) { shift; chdir $ARGV[0] or die "Can't chdir to $ARGV[0]: $!\n"; shift; } die $Usage unless ( @ARGV == 1 ); my $regex = shift; opendir( D, '.' ); my @files = grep { -f } readdir D; # we only want to look at data fil +es my @matches; for my $f ( @files ) { open( F, $f ) or do { warn "open failed for $f: $!\n"; next; }; while (<F>) { if ( m{$regex} ) { push @matches, $f; last; } } } print "The pattern {$regex} was found in ", scalar @matches, " files:\ +n"; print "@matches\n";
Re: Tricky regexp
by wind (Priest) on Jun 26, 2011 at 03:01 UTC

    Add error checking to your open statements. You are only referencing the filename and not the full path, so it can't find the file.

    The following is how I'd clean up your script:

    #!/usr/bin/perl #regexp.pl use File::Spec; use strict; use warnings; print "Gimme the address of the directory:\n"; chomp(my $folder = <>); print "What's the phrase you're looking for?\n"; chomp(my $find = <>); my @found; opendir my $dh, $folder or die "Can't open $folder: $!"; while (my $file = readdir($dh)) { next if $file =~ /^\.+$/; my $path = File::Spec->catfile($folder, $file); next if ! -f $path; open my $fh, $path or die "Can't open $path: $!"; my $data = do {local $/; <$fh>}; close $fh; if ($data =~ /\Q$find\E/i){ push @found, $file; } } close $dh; print "Your query was found in the following files:\n"; print "@found\n";
Re: Tricky regexp
by ww (Archbishop) on Jun 26, 2011 at 04:36 UTC
    If you're working in your target directory, the error message you've asked about appears because you're trying to open the parent directory (..), the current directory (.) and subdirectories, if any, as if they were files, because opendir captured those to your array of files.

    Knock those out of your @files before trying to open anything. (see http://perldoc.perl.org/perlfunc.html)

    You have some other problems in this script. Pay special attention to graff's discussion of path. Some will be easily solved if you add use diagnostics; to your pragmata; some are in nature of your failure to test the open at line 21 (and -- in the same line -- failure to use what's now considered best practice: 3 arg open with lexical filehandles.)

    Perhaps most critical among the problems you didn't ask about is the attempt at line 23 to test an array (in scalar context) for a match -- and you need to read about qr/.../ -- either qr or in Quote and Quote-like Operators to make your regex match what you expect... and at line 24, where you'll find you're pushing something quite unexpected onto @found.

    Updated for grammar, markup and clarity

    Update 2 (Warning: Sunday morning content): This is one way of attacking your target (and problem) that's along the lines you initially tried:

    #!/usr/bin/perl use warnings; use strict; # use diagnostics; #regexp.pl # 911425 print "Enter the full path to the directory you want to search: "; my $folder = <>; chomp $folder; chdir($folder); opendir(DIR, $folder) or die "Can't open $folder, $!"; my @files = readdir(DIR) or die "Can't readdir $folder, $!"; print "What's the phrase you're looking for?: "; my $find = <STDIN>; chomp $find; my $searchterm = qr/$find/; my (%found, $found, $file); for $file(@files) { next if ($file =~ /^\./); next unless (-T $file); # text files only (excludes binary f +iles such as *.doc or .xls) open(my $fh, '<', $file) or die "Can't open $file: $!"; my @content = <$fh>; for my $line(@content) { if ($line =~ /$searchterm/i) { my $key = $file; $found{$key} += 1; } } } while (my ($key, $value) = each %found) { print "$key has \t $value instance(s) of \t \"$find\"\n"; }
Re: Tricky regexp
by 7stud (Deacon) on Jun 27, 2011 at 10:54 UTC

    The glob() function will tack the file name onto the path specified for the search directory, and it will skip hidden files that start with a dot:

    use strict; use warnings; use 5.010; my @files = glob "/users/me/*"; for (@files) { say; } --output:-- /users/me/066.JPG /users/me/069.JPG /users/me/072.JPG /users/me/077.JPG /users/me/079-1.JPG /users/me/079.JPG /users/me/081-1.JPG /users/me/081.JPG /users/me/1.txt /users/me/1perl.pl ... ...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://911425]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-04-25 19:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found