Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

most terse regex match count

by bliako (Monsignor)
on May 16, 2019 at 15:25 UTC ( [id://11100099]=perlquestion: print w/replies, xml ) Need Help??

bliako has asked for the wisdom of the Perl Monks concerning the following question:

I want to test whether some string contains exactly 2 occurences of something. But this does not work:

my $outs = <<'EOS'; A B C A B D EOS # ok(2 == $outs =~ m|A|gs, "checking exactly 2 matches"); if( 2 == $outs =~ m|A|gs ){ print "exactly 2 matches\n" }

Is it possible to force list context in a conditional?

Replies are listed 'Best First'.
Re: most terse regex match count
by haukex (Archbishop) on May 16, 2019 at 15:32 UTC
    Is it possible to force list context in a conditional?

    A variation of the "Saturn" operator does the trick: if ( 2 == ( () = $outs =~ m|A|gs ) )

      Quick thought to reduce capturing: m|()A|gs? If I understand docu correctly the list is populated and then the values are discarded. So collect the smallest possible value instead.

        Quick thought to reduce capturing: m|()A|gs?

        Unfortunately it seems the extra capture group actually slows things down - no matter how much I fiddle with the parameters at the top, nocapt is always about 2x as fast:

        #!/usr/bin/env perl use warnings; use strict; use Benchmark qw/cmpthese/; my $count = 1000; my $match = 'ABC'x20; my $space = 'xAz'x100; my $str = ( $space.$match x $count ) . $space; cmpthese(-3, { nocapt => sub { die unless $count == ( () = $str =~ m|\Q$match|gs ); }, withcapt => sub { die unless $count == ( () = $str =~ m|()\Q$match|gs ); }, }); __END__ Rate withcapt nocapt withcapt 4264/s -- -50% nocapt 8596/s 102% --
Re: most terse regex match count
by Fletch (Bishop) on May 16, 2019 at 15:32 UTC

    You want 3/4ths of the operator which shall not be named and some extra parens.

    if( 2 == ( ()= $outs =~ m|A|gs ) ){ print "exactly 2 matches\n" }

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: most terse regex match count
by jwkrahn (Abbot) on May 16, 2019 at 19:09 UTC
    $ perl -le'my $x = "A B C\nA B D\n"; print "exactly two matches" if $x + =~ /^ [^A]* A [^A]* A [^A]* $/xg' exactly two matches
Re: most terse regex match count
by holli (Abbot) on May 17, 2019 at 09:30 UTC
    map gets abused so often, it can take another hit:
    if ( 2 == map $_, $outs =~ m|A|gs )
    Or maybe it's no abuse. Could be a plain map $_, construct gets optimized by perl somehow?


    holli

    You can lead your users to water, but alas, you cannot drown them.
Re: most terse regex match count
by Marshall (Canon) on May 17, 2019 at 05:49 UTC
    Yes, it is possible to achieve your stated objective, although that is not perhaps the best way to code what you are doing.
    Be aware that "terse" does not necessarily mean faster code.

    Perhaps, one way:

    #!/usr/bin/perl use strict; use warnings; my $outs = <<'EOS'; A B C A B D A D E EOS print $outs; my %hash; foreach (split ' ',$outs) { $hash{$_}++; } print "Values occurring exactly twice:\n"; foreach (keys %hash) { print "$_\n" if ($hash{$_} ==2); } __END__ A B C A B D A D E Values occurring exactly twice: B D
    Another way:
    #!/usr/bin/perl use strict; use warnings; my $outs = <<'EOS'; A B C A B D A D E EOS if ( (()=$outs =~ m|D|g)==2) { print "exactly 2 matches for D\n" } # I think better written as: my @matches = $outs =~ m|D|g; print "exactly 2 matches for D\n" if @matches ==2;
    Update: Without running deparse, I figure that ()=$outs =~ m|D|g is going to create an internal array similar to @matches, it just won't have a name in the source code. I like the 2 line version because I don't blind the reader with parens and it is both a) very easy to understand and b) will run just as quickly as the one line version. Do not mistake "terse" for "speed". It can even happen that terse is slower.

    Of course, if just looking for a single letter, tr is the fastest:

    if ( $outs =~ tr/D// == 2) { print "exactly 2 matches for D\n" }
      Be aware that "terse" does not necessarily mean f***** code.
      SHH! Don't use the f-word here or people will start posting benchmarks.


      holli

      You can lead your users to water, but alas, you cannot drown them.
       Be aware that "terse" does not necessarily mean faster code.

      Sure, I was just trying to show off to the future generations looking at the test file. tr is an interesting idea but I need to find longer strings.

Re: most terse regex match count
by bliako (Monsignor) on May 16, 2019 at 15:51 UTC

    thanks! A part of Perl I have avoided so far. Saturn reminded me of mooning

        beats SO any and every day

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11100099]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (4)
As of 2024-04-26 04:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found