http://qs321.pair.com?node_id=655252

paulnovl has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm grepping with Perl, but want to stop after the first match.

My input file contains several identical 2-line strings like this (scattered amongst other data):

# /proc/cpuinfo vendor_id : IBM/S390

This is what I have so far:
m#cpuinfo\nvendor_id\s+:\s(.*?)\n# and print "CPU:\t$1";

That returns:
CPU: IBM/S390 CPU: IBM/S390 CPU: IBM/S390

How can I tell it to stop searching after the first match?

Thanks.

Replies are listed 'Best First'.
Re: Grepping with Perl: How to stop after first match?
by johngg (Canon) on Dec 05, 2007 at 23:47 UTC
    Assuming you are reading your input file in a while loop you could try this.

    while ( <$inFH> ) { m#cpuinfo\nvendor_id\s+:\s(.*?)\n# and print "CPU:\t$1" and last; }

    I hope this is helpful.

    Cheers,

    JohnGG

      Thanks for the suggestion, but as I'm not reading the input in a loop, I can't see a way to implement it.
      My input is in the form of a stream consisting of many tar'ed text files written to standard-out when they're extracted by bzcat (from .tar.bz2 files).
      I'm trying to use perl to do the grepping, but that's only a part of a larger shell script that looks something like this (snipped for clarity):
      #! /bin/sh ## other stuff happens here, but snipped for clarity scdir=/var/log find $scdir -type f -name "nts_*.bz2" | # find interesting files in + $scdir perl -wnl -e '7 > -M and print;' | # ignore old files xargs bzcat -k | # unpack contents to STDOUT perl -wn00 -e ' # paragraph mode m[kernel\.hostname\s.\s(.*?)\n] and print "Hostname:\t$1\n"; # +grep for hostname m[\/bin\/date\n(.*?)\n] and print "Generated:\t$1\n"; # grep fo +r date ## I grep for other stuff here, but snipped for clarity m[Settings.for\s(.*?):\n] and print "Interface:\t$1\n"; # g +rep for eth interfaces m[Speed:\s(.*?)Mb\/s] and # grep for interface spee +d print "\ -speed:\t$1Mb\/s\n" ; m[Duplex:\s(.*?)\n] and # grep for interface duplex print "\ -duplex:\t$1\n" ; m[Auto-negotiation:\s(.*?)\n] and # grep for interface +autoneg print "\ -autoneg:\t$1\n" ; m[cpuinfo\nvendor_id\s+:\s(.*?)\n] and # grep for cpu id print "CPU:\t\t$1\n" ; ' exit $!

      I'm a Perl novice, so am open to suggestions that I'm not approaching this the right way.
        My first thought is, "no, you're probably not using the right tools for what you want to do."

        I would attempt the whole thing in Perl for starters. Have a look at opendir, readdir and closedir along with grep for finding the compressed tar archives you want to work with and placing the names in an array. Having found them I would then loop over them using CPAN modules to uncompress (Compress::Bzip2) and read (Archive::Tar) the archives. I would read each file in the archive into a string so that I could do as many or as few matches as I wanted.

        I should stress that I have never done anything like this before but I think you are more likely to meet with success by adopting this approach. I hope these thoughts will help you towards a solution but feel free to ask further if things aren't clear.

        Cheers,

        JohnGG

Re: Grepping with Perl: How to stop after first match?
by doom (Deacon) on Dec 05, 2007 at 23:52 UTC
    The standard perl module List::Util has a "first" that does exactly what you want.

    use List::Util qw( first ); open my $fh, '<', $your_file; my $line = first { m{ $your_pattern }x } <$fh>;

      "first" ... does exactly what you want

      "Exactly" is a strong word. If he's trying to avoid reading through the whole file he won't achive that by using first. To some, first may seem to lazily evaluate its list arguments, but I'd like to emphasize that this is not the case. first will cause the whole file to be slurped into memory.

      lodin

      Update: changed "list" to "argument" to be more technically (?) correct.

        Almost there. first itself is (almost) perfectly respectful of a lazy @_. In List::Util, the pure-perl version of first() is:
        sub first (&@) { my $code = shift; foreach (@_) { return $_ if &{$code}(); } undef; }
        The issue is (almost) completely that @_ is not lazy in the slightest. This is one of the biggest changes no-one will notice in Perl6 - the addition of the concept of truly lazy lists. You could code up something lazier with Tie::Array::Lazy. Maybe something like:
        open my $fh, '<', $filename or die "Cannot open '$filename' for readin +g: $!\n"; tie my @arr, 'Tie::Array::Lazy', [], sub { scalar <$fh> }; first { <whatever> } @arr;
        (Note that this is completely untested - I've never used Tie::Array::Lazy in my life.)

        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Grepping with Perl: How to stop after first match?
by sh1tn (Priest) on Dec 05, 2007 at 23:48 UTC
    Use last inside your loop.


Re: Grepping with Perl: How to stop after first match?
by BrowserUk (Patriarch) on Dec 06, 2007 at 10:25 UTC

    People seem to have missed that you are looking for "2-line strings". Reading and matching one line at a time won't succeed.

    You need to maintain a rolling two-line buffer to do this.

    ...## get $fh from somewhere (eg.open) my $last = ''; while( <$fh> ) { ( $last . $_ ) =~ m[cpuinfo\nvendor_id\s+:\s(.*?)\n] and print qq[CPU:$1] and last; $last = $_; } ...

    Or as a (longish) one-liner

    perl -ne"($last.$_)=~m[cpuinfo\nvendor_id\s+:\s(.*?)\n]&&print qq[CPU: +$1]&&exit;$last=$_" yourfile

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Thanks for the suggestion. I think that comes close to what I'm trying to achieve. Problem is that the "exit" doesn't allow me to continue with further grepping of the input stream.

      Here's a more complete snippet of the shell script that I'm using, showing that I'm just trying to use Perl to grep the incoming data stream (in paragraph mode).
      #! /bin/sh ## other stuff happens here, but snipped for clarity scdir=/var/log find $scdir -type f -name "nts_*.bz2" | # find interesting files in + $scdir perl -wnl -e '7 > -M and print;' | # ignore old files xargs bzcat -k | # unpack contents to STDOUT perl -wn00 -e ' # paragraph mode m[kernel\.hostname\s.\s(.*?)\n] and print "Hostname:\t$1\n"; # +grep for hostname m[\/bin\/date\n(.*?)\n] and print "Generated:\t$1\n"; # grep fo +r date ## I grep for other stuff here, but snipped for clarity m[Settings.for\s(.*?):\n] and print "Interface:\t$1\n"; # g +rep for eth interfaces m[Speed:\s(.*?)Mb\/s] and # grep for interface spee +d print "\ -speed:\t$1Mb\/s\n" ; m[Duplex:\s(.*?)\n] and # grep for interface duplex print "\ -duplex:\t$1\n" ; m[Auto-negotiation:\s(.*?)\n] and # grep for interface +autoneg print "\ -autoneg:\t$1\n" ; m[cpuinfo\nvendor_id\s+:\s(.*?)\n] and # grep for cpu id print "CPU:\t\t$1\n" ; ' exit $!

      The output is something like the following:
      Generated: Tue Dec 4 12:12:01 GMT 2007 Hostname: lnx0010 CPU: IBM/S390 CPU: IBM/S390 Interface: hsi0 -speed: 100Mb/s -duplex: Full -autoneg: on Interface: hsi1 -speed: 100Mb/s -duplex: Full -autoneg: on Interface: sit0 -speed: 100Mb/s -duplex: Full -autoneg: on CPU: IBM/S390 Generated: Mon Dec 3 12:34:19 GMT 2007 Hostname: ptsuse3 CPU: GenuineIntel Interface: eth0 -speed: 100Mb/s -duplex: Full -autoneg: on Interface: eth1 -speed: 100Mb/s -duplex: Full -autoneg: on Generated: Mon Jun 18 15:33:40 BST 2007 Hostname: icore-71 CPU: GenuineIntel CPU: GenuineIntel Interface: eth0 -speed: 100Mb/s -duplex: Half -autoneg: on

      You'll notice that there are several "CPU:" lines -- and that's because there are several occurences of the following 2-line string in the input stream:
      # /proc/cpuinfo vendor_id : IBM/S390

      I'm wanting to quit after matching the first occurence of that string, but to continue with my other "grep" statements.

      I'm new to Perl, so am open to criticism of my general approach to this problem.
Re: Grepping with Perl: How to stop after first match?
by fenLisesi (Priest) on Dec 06, 2007 at 10:48 UTC
    Some secondary points:

    • I think # is not the best delimiter.
    • dot in a default regex does not match a newline, so you probably don't need the last \n in there (Update: if you make the dot-star greedy)
    • Did you slurp the file? If not, what is your input record separator? Depending on the structure of the file and whether you are doing other matches with the data, you may be able to make use of the input record separator to simplify your regex.
    • I would recommend an if block, but if you want to use and for flow control, consider putting the and at beginning of the second line
    • What is the purpose of (.*?)
Re: Grepping with Perl: How to stop after first match?
by Anonymous Monk on Mar 21, 2013 at 23:09 UTC

    Why has nobody mentioned the ? delimiter for m and s?

    I know this is several years too late to help the OP, but since it is coming up on top of a Google search, there must be a lot of people looking for something similar.

    Answer: if you search with m?abc? or use s?abc?def? then IT WILL ONLY MATCH ONCE. The pattern can be reset to fire again with the

    reset

    operator. It is common to do that at end-of-file is you are processing many files and want to treat each one as a self-contained item to search. Further down is a sample of code showing this.

    Without reading the full details of the OP's question (kind of pointless answering it exactly since it's so old) I think the OP should have had "state machines" explained to him. Since he was new to Perl but perhaps not programming in general, that might have been enough clue for him to realise how he might match patterns across multiple lines.

    Following is a decent template which illustrates a simple state machine, and uses one-shot patterns:

    my $state = 0; while(<>) { if($state == 0 && /CPU/) { $state = 1; } elsif($state == 1 && ?vendor?) { $state = 2; # YIPPEE! process the data } else { $state = 0; # fail } } continue { if(eof) { # do NOT put () on eof - RTFM close ARGV; # resets $. to start at 0 for the next file $state = 0; reset; } }

    Of course, the state machine here actually takes over the task of matching only once, so the m?? is not strictly necessary here. If you didn't need to match across lines (let's say you're only looking for the first CPU) then you'd use m?CPU? and probably ignore the state machine functionality.

    Another alternative (as mentioned) is "slurp"ing the contents, then multi-line patterns CAN be made to work with appropriate flags on the pattern. Once again, using m?? for the multi-line pattern will ensure it only works once (per file with reset), and the state machine wouldn't be needed in this example.

    -- Andrew Clarke