stevieb has asked for the wisdom of the Perl Monks concerning the following question:
I've got a very large and complex distribution where several of the modules use a pretty high number of somewhat complex regexes. I have decided instead of having them peppered throughout the code, I'd create a new module, Regex.pm that would house and return these regexes based on name.
Now, this all works well and fine after some fiddling and learning where certain flags need to be set. Here is a basic example:
use warnings;
use strict;
package Re; {
my %h = (
re => qr/
[Pp]erl-\d\.\d+\.\d+(?:_\w+)?
\s+===.*?
(?=(?:[Pp]erl-\d\.\d+\.\d+(?:_\w+)?\s+===|$))
/xs,
);
sub re {
return $h{re};
}
}
package main; {
my $str;
{
local $/;
$str = <DATA>;
}
my $re = Re::re();
my @results = $str =~ /$re/g;
print scalar @results;
}
__DATA__
perl-5.26.1
==========
Reading '/home/spek/.cpan/Metadata'
Database was generated on Tue, 13 Feb 2018 15:29:02 GMT
App::cpanminus is up to date (1.7043).
--> Working on .
Configuring /home/spek/repos/mock-sub ... OK
<== Installed dependencies for .. Finishing.
--> Working on .
Configuring /home/spek/repos/mock-sub ... Generating a Unix-style Make
+file
Writing Makefile for Mock::Sub
Writing MYMETA.yml and MYMETA.json
OK
Building and testing Mock-Sub-1.10 ... Skip blib/lib/Mock/Sub.pm (unch
+anged)
Skip blib/lib/Mock/Sub/Child.pm (unchanged)
Manifying 2 pod documents
PERL_DL_NONLAZY=1 "/home/spek/perl5/perlbrew/perls/perl-5.26.1/bin/per
+l" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Har
+ness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/00-load.t .................... ok
t/01-called.t .................. ok
t/02-called_count.t ............ ok
t/03-instantiate.t ............. ok
t/04-return_value.t ............ ok
t/05-side_effect.t ............. ok
t/06-reset.t ................... ok
t/07-name.t .................... ok
t/08-called_with.t ............. ok
t/09-void_context.t ............ ok
t/10-unmock.t .................. ok
t/11-state.t ................... ok
t/12-mocked_subs.t ............. ok
t/13-mocked_objects.t .......... ok
t/14-core_subs.t ............... ok
t/15-remock.t .................. ok
t/16-non_exist_warn.t .......... ok
t/17-no_warnings.t ............. ok
t/18-bug_25-retval_override.t .. ok
t/19-return_params.t ........... ok
t/manifest.t ................... skipped: Author tests not required fo
+r installation
t/pod-coverage.t ............... skipped: Author tests not required fo
+r installation
t/pod.t ........................ skipped: Author tests not required fo
+r installation
All tests successful.
Files=23, Tests=243, 2 wallclock secs ( 0.13 usr 0.04 sys + 1.75 cu
+sr 0.13 csys = 2.05 CPU)
Result: PASS
OK
Successfully tested Mock-Sub-1.10
In the code, I've got this:
my $re = Re::re();
my @results = $str =~ /$re/g;
What I'm wondering, and haven't been able to sort out if it's possible, is skip the variable instantiation, and use the function call directly when using the regex, like this:
my @results = $str =~ /Re::re()/g;
Doable, or am I chasing down something impossible?
Re: Use function as a regex
by choroba (Cardinal) on Feb 15, 2018 at 16:37 UTC
|
You can use a reference-dereference trick, or the experimental postponed regular subexpression.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
{ package Re;
sub re { qr/perl/i }
}
say 'perl Perl PERL' =~ /${ \Re::re() }/g;
say 'perl Perl PERL' =~ /(??{ Re::re() })/g;
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [d/l] [select] |
|
say 'perl Perl PERL' =~ /${Re::re()}/g;
| [reply] [d/l] [select] |
Re: Use function as a regex
by salva (Canon) on Feb 15, 2018 at 17:06 UTC
|
Probably overkill, but you can check how Regexp::Common does it.
It exports the regexps in a tied hash, and uses some tricks to generate parametrized regexps on demand.
The interesting thing is that you can interpolate the hash slots directly as in $foo =~ /$RE{balanced}{-parens=>'()'}/ | [reply] [d/l] |
|
This is a great idea! I'll have a good look at Regexp::Common over the weekend. I did get a full implementation and all my tests are passing on both Unix and Windows, so I did get it to work correctly. I've moved on to updating berrybrew with some new enhancements/bug fixes before I can get back to that distribution, as it relies on some berrybrew changes anyway.
I'll create a new branch and test it out, but at this time, this is what I ended up with:
package Test::BrewBuild::Regex;
use strict;
use warnings;
use Carp qw(croak);
use Exporter qw(import);
our $VERSION = '2.20';
our @EXPORT = qw(
re_brewbuild
re_brewcommands
re_dispatch
re_git
);
my %brewbuild = (
check_failed => qr{failed.*?See\s+(.*?)\s+for details},
check_result => qr{
[Pp]erl-\d\.\d+\.\d+(?:_\w+)?
\s+===.*?
(?=(?:[Pp]erl-\d\.\d+\.\d+(?:_\w+)?\s+===|$))
}xs,
extract_dzil_dist_name => qr{^name\s+=\s+(.*)$},
extract_dzil_dist_version => qr{^version\s+=\s+(.*)$},
extract_errors => qr{
cpanm\s+\(App::cpanminus\)
.*?
(?=(?:cpanm\s+\(App::cpanminus\)|$))
}xs,
extract_error_perl_ver => qr{cpanm.*?perl\s(5\.\d+)\s},
extract_result => qr{
([Pp]erl-\d\.\d+\.\d+(?:_\w+)?\s+=+?)
(\s+.*?)
(?=(?:[Pp]erl-\d\.\d+\.\d+(?:_\w+)?\s+===|$))
}xs,
extract_perl_version => qr{^([Pp]erl-\d\.\d+\.\d+(_\d{2})?)},
);
my %brewcommands = (
available_berrybrew => qr{(\d\.\d+\.\d+_\d+)},
available_perlbrew => qr{(?<!c)(perl-\d\.\d+\.\d+(?:-RC\d+)?)},
installed_berrybrew => qr{(\d\.\d{2}\.\d(?:_\d{2}))(?!=_)\s+\[inst
+alled\]}i,
installed_perlbrew => qr{i\s+(perl-\d\.\d+\.\d+)},
using_berrybrew => qr{(\d\.\d{2}\.\d(?:_\d{2}))(?!=_)\s+\[installe
+d\]\s+\*}i,
);
my %dispatch = (
extract_short_results => qr{(5\.\d{1,2}\.\d{1,2} :: \w{4})},
);
my %git = (
extract_repo_name => qr{.*/(.*?)(?:\.git)*$},
extract_commit_csum => qr{([A-F0-9]{40})\s+HEAD}i,
);
sub re_brewbuild {
my $re = shift;
_check(\%brewbuild, $re);
return $brewbuild{$re};
}
sub re_brewcommands {
my $re = shift;
_check(\%brewcommands, $re);
return $brewcommands{$re};
}
sub re_dispatch {
my $re = shift;
_check(\%dispatch, $re);
return $dispatch{$re};
}
sub re_git {
my $re = shift;
_check(\%git, $re);
return $git{$re};
}
sub _check {
my ($module, $re) = @_;
croak "regex '$re' doesn't exist for re_${module}()"
if ! exists $module->{$re};
}
1;
(POD redacted).
Thanks for all of the tips and advice everyone! | [reply] [d/l] [select] |
|
That is exactly how I would do it and in fact have done it. Copy the text of one of those modules and adapt it to your project's namespace, then refill it with the regexes that you need. It isn't overkill at all.
| [reply] |
Re: Use function as a regex
by tybalt89 (Monsignor) on Feb 15, 2018 at 16:34 UTC
|
my @results = $str =~ /@{[ Re::re() ]}/g;
| [reply] [d/l] |
|
Heh, that's quite slick, thanks! :)
Might I ask where you found that tidbit? I swear I read the docs, so I must have overlooked that somewhere. I'd like to get an understanding of how and why that works properly.
| [reply] |
|
| [reply] |
Re: Use function as a regex
by Eily (Monsignor) on Feb 15, 2018 at 16:35 UTC
|
How about just Re::re($str) instead? Or maybe $str =~ /$Re::h{re}/g
Without the /g you would have been able to use $str =~ Re::re(). But to have the global search I can only think of $str =~ /@{[ Re::re() ]}/g or /(??{ Re::re() })/g which aren't much of an improvement.
Can your subs have parameters? If there is one regex for each sub, where the name of the sub is also the key in the hash, you maybe be interested in AUTOLOAD.
Edit: choroba made me realize it's (??{ }) not (?{{ }}) (which should have been obvious, as the latter is actually a case of (?{ }))
| [reply] [d/l] [select] |
|
package Test::BrewBuild::Regex;
use strict;
use warnings;
use Carp qw(croak);
use Exporter qw(import);
our $VERSION = '2.20';
our @EXPORT = qw(
re_brewbuild
);
my %brewbuild = (
check_failed => qr{
failed.*?See\s+(.*?)\s+for details
}x,
check_result => qr{
[Pp]erl-\d\.\d+\.\d+(?:_\w+)?
\s+===.*?
(?=(?:[Pp]erl-\d\.\d+\.\d+(?:_\w+)?\s+===|$))
}xs,
extract_dist_name => qr{
^name\s+=\s+(.*)$
}x,
extract_dist_version => qr{
^version\s+=\s+(.*)$
}x,
extract_errors => qr{
cpanm\s+\(App::cpanminus\)
.*?
(?=(?:cpanm\s+\(App::cpanminus\)|$))
}xs,
extract_error_perl_ver => qr{
cpanm.*?perl\s(5\.\d+)\s
}x,
extract_result => qr{
([Pp]erl-\d\.\d+\.\d+(?:_\w+)?\s+=+?)
(\s+.*?)
(?=(?:[Pp]erl-\d\.\d+\.\d+(?:_\w+)?\s+===|$))
}xs,
extract_perl_version => qr{
^([Pp]erl-\d\.\d+\.\d+(_\d{2})?)
}x,
);
sub re_brewbuild {
my $re = shift;
_check(\%brewbuild, $re);
return $brewbuild{$re};
}
sub _check {
my ($module, $re) = @_;
croak "regex '$re' doesn't exist for re_brewbuild()"
if ! exists $module->{$re};
}
1;
| [reply] [d/l] |
|
| [reply] [d/l] [select] |
|
| [reply] [d/l] |
|
|
|