http://qs321.pair.com?node_id=19212

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

While searching a text file for dates, this program prints the text surrounding the date as well, but is only supposed to print the date. I have obviously missed something, any suggestions would be appreciated.

$dir='C:/texts/'; opendir(directory,$dir) or die "cant"; while($file=readdir directory){ next if $file=~/^\./; $rfname=$dir.$file; # print "Found file: '$rfname'\n"; open (CONT, $rfname); while (<CONT>){ if($_=~m/[0-3]?[0-9(th)?(st)?(nd)?(rd)?]\s+(Jan(uary)?|Feb(ruary)? +|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(obe +r)?|Nov(ember)?|Dec(ember)?)\s+[0-9]?[0-9]?[0-9][0-9]/ig){ print "$file\t $_\n"; } elsif($_=~m/(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?| +Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+ +[1-3]?[0-9](th)?(nd)?(st)?(rd)?\s+[0-9]?[0-9]?[0-9][0-9]/ig){ print "$file\t $_\n"; } } }

Petruchio Thu Jul 12 01:55:32 EDT 2001: Added code tags.

Replies are listed 'Best First'.
RE: $_
by jjhorner (Hermit) on Jun 21, 2000 at 16:46 UTC

    When are people going to learn about the < code> notation?

    As best as I can tell, here is what it says:

    $dir='C:/texts/'; opendir(directory,$dir) or die "cant"; while($file=readdir directory){ next if $file=~/^\./; $rfname=$dir.$file; # print "Found file: '$rfname'\n"; open (CONT, $rfname); while (<CONT>){ if($_=~m/[0-3]?[0-9(th)?(st)?(nd)?(rd)?]\s+(Jan(uary)?|Feb(ruary)? +|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(obe +r)?|Nov(ember)?|Dec(ember)?)\s+[0-9]?[0-9]?[0-9][0-9]/ig){ print "$file\t $_\n"; } elsif($_=~m/(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?| +Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+ +[1-3]?[0-9](th)?(nd)?(st)?(rd)?\s+[0-9]?[0-9]?[0-9][0-9]/ig){ print "$file\t $_\n"; } }

    Looking at your code, and another response, I see your problem. You are telling it to print the entire string.

    If you have _Learning Perl_, by our own merlyn, look at 7.3.2.3, "Parentheses as memory".

    Also, from _Programming Perl_:

         A regular expression in parentheses, (...), matches whatever the regular expression (represented by ...) matches according to Rule 2. Parentheses
         therefore serve as a grouping operator for quantification. Parentheses also have the side effect of remembering the matched substring for later use in a
         backreference (to be discussed later). This side effect can be suppressed by using (?:...) instead, which has only the grouping semantics - it doesn't store
         anything in $1, $2, and so on.
    

    Look into spending $60 for the Perl CD Bookshelf. It rocks.

    J. J. Horner
    Linux, Perl, Apache, Stronghold, Unix
    jhorner@knoxlug.org http://www.knoxlug.org/
    
Re: $_
by raflach (Pilgrim) on Jun 21, 2000 at 17:15 UTC
    in actual fact, change your line like this, and you should have what you want.
    if($_=~m/([0-3]?[0-9(th)?(st)?(nd)?(rd)?]\s+(Jan(uary)?|Feb(ruary)?|Ma +r(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)? +|Nov(ember)?|Dec(ember)?)\s+[0-9]?[0-9]?[0-9][0-9])/ig){ print "$file\t $1\n"; }
    And do the same for the other line.
Re: $_
by davorg (Chancellor) on Jun 21, 2000 at 16:48 UTC

    Best to enclose this stuff in <CODE>..</CODE> tags so it looks like this...

    $dir='C:/texts/'; opendir(directory,$dir) or die "cant"; while($file=readdir directory){ next if $file=~/^\./; $rfname=$dir.$file; # print "Found file: '$rfname'\n"; open (CONT, $rfname); while (<CONT>){ if($_=~m/<a href="/index.pl?node=0-3&lastnode_id=19212">0-3</a>?<a + href="/index.pl?node=0-9%28th%29%3F%28st%29%3F%28nd%29%3F%28rd%29%3F +&lastnode_id=19212">0-9(th)?(st)?(nd)?(rd)?</a>\s+(Jan(uary)?|Feb(rua +ry)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct +(ober)?|Nov(ember)?|Dec(ember)?)\s+<a href="/index.pl?node=0-9&lastno +de_id=19212">0-9</a>?<a href="/index.pl?node=0-9&lastnode_id=19212">0 +-9</a>?<a href="/index.pl?node=0-9&lastnode_id=19212">0-9</a><a href= +"/index.pl?node=0-9&lastnode_id=19212">0-9</a>/ig){ print "$file\t $_\n"; } elsif($_=~m/(Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?| +Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)?|Nov(ember)?|Dec(ember)?)\s+ +<a href="/index.pl?node=1-3&lastnode_id=19212">1-3</a>?<a href="/inde +x.pl?node=0-9&lastnode_id=19212">0-9</a>(th)?(nd)?(st)?(rd)?\s+<a hre +f="/index.pl?node=0-9&lastnode_id=19212">0-9</a>?<a href="/index.pl?n +ode=0-9&lastnode_id=19212">0-9</a>?<a href="/index.pl?node=0-9&lastno +de_id=19212">0-9</a><a href="/index.pl?node=0-9&lastnode_id=19212">0- +9</a>/ig){ print "$file\t $_\n"; } } }

    Looking at your code, it prints out the name of the file and the complete line when the line matches the regular expression. If that's not what you want then you'll need to capture part of the match using brackets and print the value of $1, not $_.


    --
    <a href="http://www.dave.org.uk><http://www.dave.org.uk>

    European Perl Conference - Sept 22/24 2000
    <http://www.yapc.org/Europe/>
      Please DONT do that.
      When using the <code> tags, or <pre> tags for that matter, please avoid long lines. They mess up the whole page for everyone. Thank you.
Re: $_ ('x' regex modifier)
by Russ (Deacon) on Jun 22, 2000 at 01:18 UTC
    While on the discussion of code tags, let me mention the x regexp modifier. It lets you use whitespace
    in your regular expressions for greater readability.
    m/[0-3]? [0-9(th)?(st)?(nd)?(rd)?] \s+ (Jan(uary)?| Feb(ruary)?| Mar(ch)?| Apr(il)?| May| Jun(e)?| Jul(y)?| Aug(ust)?| Sep(tember)?| Oct(ober)?| Nov(ember)?| Dec(ember)?) \s+ [0-9]?[0-9]?[0-9][0-9]/igx;
    The extra whitespace makes it far easier to follow (and doesn't make the browser screen width 4000 pixels wide. ;-)

    Russ