Note 'saxyzzy' and'Wen' in the output:
Win8 Strawberry 5.30.3.1 (64) Tue 07/26/2022 6:46:09
C:\@Work\Perl\monks
>perl
use strict;
use warnings;
my $input = q/saxyzzy Monday Saturday Thursday Saturday Sat Sun Mon Tu
+e Wen Th/;
my $re =
join "|",
map ".*\\b($_\\S*)",
map m/(..)/,
qw/monday tuesday wednesday thursday friday saturday sunday/
;
print "\$re '$re' \n";
print join '.', $input =~ m/^(?|$re)\b(?{ print "$1\n" })(*FAIL)/i;
^Z
$re '.*\b(mo\S*)|.*\b(tu\S*)|.*\b(we\S*)|.*\b(th\S*)|.*\b(fr\S*)|.*\b(
+sa\S*)|.*\b(su\S*)'
Mon
Monday
Tue
Wen
Th
Thursday
Sat
Saturday
Saturday
saxyzzy
Sun
This can be avoided with more complete automatic pattern generation, but some false positives will remain, e.g., 'satu' for Saturday.
File sort_weekdays_2.pl:
# sort_weekdays_2.pl
# method relies on left-anchored nature of "short" day names,
# e.g., th/thu/thur/thurs/thursday.
# automatic pattern generation. possibility of false-positive
# matches exists, e.g., 'wedne' for wednesday.
use 5.018; # need reliable regex interaction with lexicals
use 5.010; # need regex extended patterns (?|...) branch reset
use strict;
use warnings;
use Test::More 'tests' => 2;
use Test::NoWarnings;
use Data::Dump qw(dd);
# works - false positive matches possible.
my ($rx_days) =
map qr{ (?i) $_ }xms,
join ' | ',
# map { dd $_; $_; } # for debug
map ".*? \\b (${ \day_parts($_) })", # works
# map { dd $_; $_; } # for debug
map { @$_ == 3 or die "bad day '$_'"; $_; } # minimal validation
map [ m{ \A ([motuwehfrsa]{2}) ([nesdritu]+?) (day) \z }xms ], #
+works
# map [ m{ \A ([a-z]{2}) ([a-z]+?) (day) \z }xms ], # works
qw/monday tuesday wednesday thursday friday saturday sunday/
;
# print "\$rx_days $rx_days \n"; # for debug
my $test = <<'EOS';
Monday Saturday Thursday Saturday Sat Sun Mon Tue Th
nothing on this or the following line should extract
shdhsd mond s mondaytuesday mondayxtuesday xsun sunx xsunx saxyzzy Wen
mo tu we th fr sa su mon tue wed thu fri sat sun
blank lines don't matter
mOnDaY TuE wEd ThUrS mO sUn
false positive matches are possible with automatic pattern generation
satu wednes
EOS
# day-names extracted in day order, then by order of
# appearance in the source string.
# (?{ ... }) interface to lexical variable only reliable for perl ver
+5.18+
# this problem can be avoided by using package-global variables.
my @got; # lexical unreliable in older perls
$test =~ m{ \A (?| $rx_days) \b (?{ push @got, $^N }) (*FAIL) }xms;
# dd \@got; # for debug
is_deeply \@got, [ qw(
Monday Mon mo mon mOnDaY mO
Tue tu tue TuE
we wed wEd wednes
Thursday Th th thu ThUrS
fr fri
Saturday Saturday Sat sa sat satu
Sun su sun sUn
)], 'extraction sorted by day-order';
sub day_parts {
my ($ar_parts, # substrings to assemble to nested regex
) = @_;
# starts like [ 'tu', 'es', 'day' ]
# intermediate ( 'tu', 'e', 's', 'day' )
# ends like tu (?: e (?: s (?: day)?)?)?
my $start = $ar_parts->[0];
my @parts = (split('', $ar_parts->[1]), $ar_parts->[2]);
return "$start @{[ _day_parts(@parts) ]}";
}
sub _day_parts { return @_ ? "(?: @{[ shift, _day_parts(@_) ]})?" : ()
+; }
False-positive day-name matches can be entirely avoided with a custom regex:
my $rx_days = qr{ (?i) # custom patterns
.*? \K \b mo (?: n (?: day)?)? |
.*? \K \b tu (?: e (?: s (?: day)?)?)? |
.*? \K \b we (?: d (?: nes day)?)? |
.*? \K \b th (?: u (?: r (?: s (?: day)?)?)?)? |
.*? \K \b fr (?: i (?: day)?)? |
.*? \K \b sa (?: t (?: ur day)?)? |
.*? \K \b su (?: n (?: day)?)?
}xms;
My somewhat arbitrary use of
\K in this regex means the extraction regex must be changed. This works:
$test =~ m{ \A $rx_days \b (?{ push @got, ${^MATCH} }) (*FAIL) }xmsp;
Use of the
${^MATCH} match variable (and
/p modifier) can be avoided by use of the
substr $test, $-[0], $+[0]-$-[0]
expression.
Give a man a fish: <%-{-{-{-<