Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Use function as a regex

by stevieb (Canon)
on Feb 15, 2018 at 16:22 UTC ( [id://1209231]=perlquestion: print w/replies, xml ) Need Help??

stevieb has asked for the wisdom of the Perl Monks concerning the following question:

I've got a very large and complex distribution where several of the modules use a pretty high number of somewhat complex regexes. I have decided instead of having them peppered throughout the code, I'd create a new module, Regex.pm that would house and return these regexes based on name.

Now, this all works well and fine after some fiddling and learning where certain flags need to be set. Here is a basic example:

use warnings; use strict; package Re; { my %h = ( re => qr/ [Pp]erl-\d\.\d+\.\d+(?:_\w+)? \s+===.*? (?=(?:[Pp]erl-\d\.\d+\.\d+(?:_\w+)?\s+===|$)) /xs, ); sub re { return $h{re}; } } package main; { my $str; { local $/; $str = <DATA>; } my $re = Re::re(); my @results = $str =~ /$re/g; print scalar @results; } __DATA__ perl-5.26.1 ========== Reading '/home/spek/.cpan/Metadata' Database was generated on Tue, 13 Feb 2018 15:29:02 GMT App::cpanminus is up to date (1.7043). --> Working on . Configuring /home/spek/repos/mock-sub ... OK <== Installed dependencies for .. Finishing. --> Working on . Configuring /home/spek/repos/mock-sub ... Generating a Unix-style Make +file Writing Makefile for Mock::Sub Writing MYMETA.yml and MYMETA.json OK Building and testing Mock-Sub-1.10 ... Skip blib/lib/Mock/Sub.pm (unch +anged) Skip blib/lib/Mock/Sub/Child.pm (unchanged) Manifying 2 pod documents PERL_DL_NONLAZY=1 "/home/spek/perl5/perlbrew/perls/perl-5.26.1/bin/per +l" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Har +ness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/00-load.t .................... ok t/01-called.t .................. ok t/02-called_count.t ............ ok t/03-instantiate.t ............. ok t/04-return_value.t ............ ok t/05-side_effect.t ............. ok t/06-reset.t ................... ok t/07-name.t .................... ok t/08-called_with.t ............. ok t/09-void_context.t ............ ok t/10-unmock.t .................. ok t/11-state.t ................... ok t/12-mocked_subs.t ............. ok t/13-mocked_objects.t .......... ok t/14-core_subs.t ............... ok t/15-remock.t .................. ok t/16-non_exist_warn.t .......... ok t/17-no_warnings.t ............. ok t/18-bug_25-retval_override.t .. ok t/19-return_params.t ........... ok t/manifest.t ................... skipped: Author tests not required fo +r installation t/pod-coverage.t ............... skipped: Author tests not required fo +r installation t/pod.t ........................ skipped: Author tests not required fo +r installation All tests successful. Files=23, Tests=243, 2 wallclock secs ( 0.13 usr 0.04 sys + 1.75 cu +sr 0.13 csys = 2.05 CPU) Result: PASS OK Successfully tested Mock-Sub-1.10

In the code, I've got this:

my $re = Re::re(); my @results = $str =~ /$re/g;

What I'm wondering, and haven't been able to sort out if it's possible, is skip the variable instantiation, and use the function call directly when using the regex, like this:

my @results = $str =~ /Re::re()/g;

Doable, or am I chasing down something impossible?

Replies are listed 'Best First'.
Re: Use function as a regex
by choroba (Cardinal) on Feb 15, 2018 at 16:37 UTC
    You can use a reference-dereference trick, or the experimental postponed regular subexpression.
    #!/usr/bin/perl use warnings; use strict; use feature qw{ say }; { package Re; sub re { qr/perl/i } } say 'perl Perl PERL' =~ /${ \Re::re() }/g; say 'perl Perl PERL' =~ /(??{ Re::re() })/g;
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      say 'perl Perl PERL' =~ /${ \Re::re() }/g;

      Abusing the internal representation of regular expressions, this also seems to work:

      say 'perl Perl PERL' =~ /${Re::re()}/g;
Re: Use function as a regex
by salva (Canon) on Feb 15, 2018 at 17:06 UTC
    Probably overkill, but you can check how Regexp::Common does it.

    It exports the regexps in a tied hash, and uses some tricks to generate parametrized regexps on demand.

    The interesting thing is that you can interpolate the hash slots directly as in $foo =~ /$RE{balanced}{-parens=>'()'}/

      This is a great idea! I'll have a good look at Regexp::Common over the weekend. I did get a full implementation and all my tests are passing on both Unix and Windows, so I did get it to work correctly. I've moved on to updating berrybrew with some new enhancements/bug fixes before I can get back to that distribution, as it relies on some berrybrew changes anyway.

      I'll create a new branch and test it out, but at this time, this is what I ended up with:

      package Test::BrewBuild::Regex; use strict; use warnings; use Carp qw(croak); use Exporter qw(import); our $VERSION = '2.20'; our @EXPORT = qw( re_brewbuild re_brewcommands re_dispatch re_git ); my %brewbuild = ( check_failed => qr{failed.*?See\s+(.*?)\s+for details}, check_result => qr{ [Pp]erl-\d\.\d+\.\d+(?:_\w+)? \s+===.*? (?=(?:[Pp]erl-\d\.\d+\.\d+(?:_\w+)?\s+===|$)) }xs, extract_dzil_dist_name => qr{^name\s+=\s+(.*)$}, extract_dzil_dist_version => qr{^version\s+=\s+(.*)$}, extract_errors => qr{ cpanm\s+\(App::cpanminus\) .*? (?=(?:cpanm\s+\(App::cpanminus\)|$)) }xs, extract_error_perl_ver => qr{cpanm.*?perl\s(5\.\d+)\s}, extract_result => qr{ ([Pp]erl-\d\.\d+\.\d+(?:_\w+)?\s+=+?) (\s+.*?) (?=(?:[Pp]erl-\d\.\d+\.\d+(?:_\w+)?\s+===|$)) }xs, extract_perl_version => qr{^([Pp]erl-\d\.\d+\.\d+(_\d{2})?)}, ); my %brewcommands = ( available_berrybrew => qr{(\d\.\d+\.\d+_\d+)}, available_perlbrew => qr{(?<!c)(perl-\d\.\d+\.\d+(?:-RC\d+)?)}, installed_berrybrew => qr{(\d\.\d{2}\.\d(?:_\d{2}))(?!=_)\s+\[inst +alled\]}i, installed_perlbrew => qr{i\s+(perl-\d\.\d+\.\d+)}, using_berrybrew => qr{(\d\.\d{2}\.\d(?:_\d{2}))(?!=_)\s+\[installe +d\]\s+\*}i, ); my %dispatch = ( extract_short_results => qr{(5\.\d{1,2}\.\d{1,2} :: \w{4})}, ); my %git = ( extract_repo_name => qr{.*/(.*?)(?:\.git)*$}, extract_commit_csum => qr{([A-F0-9]{40})\s+HEAD}i, ); sub re_brewbuild { my $re = shift; _check(\%brewbuild, $re); return $brewbuild{$re}; } sub re_brewcommands { my $re = shift; _check(\%brewcommands, $re); return $brewcommands{$re}; } sub re_dispatch { my $re = shift; _check(\%dispatch, $re); return $dispatch{$re}; } sub re_git { my $re = shift; _check(\%git, $re); return $git{$re}; } sub _check { my ($module, $re) = @_; croak "regex '$re' doesn't exist for re_${module}()" if ! exists $module->{$re}; } 1;

      (POD redacted).

      Thanks for all of the tips and advice everyone!

      That is exactly how I would do it and in fact have done it. Copy the text of one of those modules and adapt it to your project's namespace, then refill it with the regexes that you need. It isn't overkill at all.
Re: Use function as a regex
by tybalt89 (Monsignor) on Feb 15, 2018 at 16:34 UTC
    my @results = $str =~ /@{[ Re::re() ]}/g;

      Heh, that's quite slick, thanks! :)

      Might I ask where you found that tidbit? I swear I read the docs, so I must have overlooked that somewhere. I'd like to get an understanding of how and why that works properly.

        I don't think you'll find it as is in the doc. It's a side effect of the "@{ REF }" interpolation where REF can actually be an arbitrary expression, so it can be an anonymous array ref that only contains the one element you want. It's known as a secret operator.

Re: Use function as a regex
by Eily (Monsignor) on Feb 15, 2018 at 16:35 UTC

    How about just Re::re($str) instead? Or maybe $str =~ /$Re::h{re}/g

    Without the /g you would have been able to use $str =~ Re::re(). But to have the global search I can only think of $str =~ /@{[ Re::re() ]}/g or /(??{ Re::re() })/g which aren't much of an improvement.

    Can your subs have parameters? If there is one regex for each sub, where the name of the sub is also the key in the hash, you maybe be interested in AUTOLOAD.

    Edit: choroba made me realize it's (??{ }) not (?{{ }}) (which should have been obvious, as the latter is actually a case of (?{ }))

      The sub(s) will take parameters. One sub per module in the distribution. Here's the real code section of the new module. I've only added one module's sub so far:

      package Test::BrewBuild::Regex; use strict; use warnings; use Carp qw(croak); use Exporter qw(import); our $VERSION = '2.20'; our @EXPORT = qw( re_brewbuild ); my %brewbuild = ( check_failed => qr{ failed.*?See\s+(.*?)\s+for details }x, check_result => qr{ [Pp]erl-\d\.\d+\.\d+(?:_\w+)? \s+===.*? (?=(?:[Pp]erl-\d\.\d+\.\d+(?:_\w+)?\s+===|$)) }xs, extract_dist_name => qr{ ^name\s+=\s+(.*)$ }x, extract_dist_version => qr{ ^version\s+=\s+(.*)$ }x, extract_errors => qr{ cpanm\s+\(App::cpanminus\) .*? (?=(?:cpanm\s+\(App::cpanminus\)|$)) }xs, extract_error_perl_ver => qr{ cpanm.*?perl\s(5\.\d+)\s }x, extract_result => qr{ ([Pp]erl-\d\.\d+\.\d+(?:_\w+)?\s+=+?) (\s+.*?) (?=(?:[Pp]erl-\d\.\d+\.\d+(?:_\w+)?\s+===|$)) }xs, extract_perl_version => qr{ ^([Pp]erl-\d\.\d+\.\d+(_\d{2})?) }x, ); sub re_brewbuild { my $re = shift; _check(\%brewbuild, $re); return $brewbuild{$re}; } sub _check { my ($module, $re) = @_; croak "regex '$re' doesn't exist for re_brewbuild()" if ! exists $module->{$re}; } 1;

      You can probably use straight-up $str =~ re() syntax with global searches as well, just make sure the re() returns a :lvalue. Either that or return a ref: $str =~ ${re()}.

        I don't see any connexion between :lvalue and /g. $str =~ ${re()} isn't a global search but the output of qr is already a reference so /${re()}/g works, as pointed out by salva.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1209231]
Approved by herveus
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-04-20 00:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found