Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^5: Challenge: sort weekdays in week-order (grep)

by AnomalousMonk (Archbishop)
on Jul 26, 2022 at 19:07 UTC ( [id://11145768]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Challenge: sort weekdays in week-order (grep)
in thread Challenge: sort weekdays in week-order (elegantly and efficiently)

Note 'saxyzzy' and'Wen' in the output:

Win8 Strawberry 5.30.3.1 (64) Tue 07/26/2022 6:46:09 C:\@Work\Perl\monks >perl use strict; use warnings; my $input = q/saxyzzy Monday Saturday Thursday Saturday Sat Sun Mon Tu +e Wen Th/; my $re = join "|", map ".*\\b($_\\S*)", map m/(..)/, qw/monday tuesday wednesday thursday friday saturday sunday/ ; print "\$re '$re' \n"; print join '.', $input =~ m/^(?|$re)\b(?{ print "$1\n" })(*FAIL)/i; ^Z $re '.*\b(mo\S*)|.*\b(tu\S*)|.*\b(we\S*)|.*\b(th\S*)|.*\b(fr\S*)|.*\b( +sa\S*)|.*\b(su\S*)' Mon Monday Tue Wen Th Thursday Sat Saturday Saturday saxyzzy Sun

This can be avoided with more complete automatic pattern generation, but some false positives will remain, e.g., 'satu' for Saturday.

File sort_weekdays_2.pl:

# sort_weekdays_2.pl # method relies on left-anchored nature of "short" day names, # e.g., th/thu/thur/thurs/thursday. # automatic pattern generation. possibility of false-positive # matches exists, e.g., 'wedne' for wednesday. use 5.018; # need reliable regex interaction with lexicals use 5.010; # need regex extended patterns (?|...) branch reset use strict; use warnings; use Test::More 'tests' => 2; use Test::NoWarnings; use Data::Dump qw(dd); # works - false positive matches possible. my ($rx_days) = map qr{ (?i) $_ }xms, join ' | ', # map { dd $_; $_; } # for debug map ".*? \\b (${ \day_parts($_) })", # works # map { dd $_; $_; } # for debug map { @$_ == 3 or die "bad day '$_'"; $_; } # minimal validation map [ m{ \A ([motuwehfrsa]{2}) ([nesdritu]+?) (day) \z }xms ], # +works # map [ m{ \A ([a-z]{2}) ([a-z]+?) (day) \z }xms ], # works qw/monday tuesday wednesday thursday friday saturday sunday/ ; # print "\$rx_days $rx_days \n"; # for debug my $test = <<'EOS'; Monday Saturday Thursday Saturday Sat Sun Mon Tue Th nothing on this or the following line should extract shdhsd mond s mondaytuesday mondayxtuesday xsun sunx xsunx saxyzzy Wen mo tu we th fr sa su mon tue wed thu fri sat sun blank lines don't matter mOnDaY TuE wEd ThUrS mO sUn false positive matches are possible with automatic pattern generation satu wednes EOS # day-names extracted in day order, then by order of # appearance in the source string. # (?{ ... }) interface to lexical variable only reliable for perl ver +5.18+ # this problem can be avoided by using package-global variables. my @got; # lexical unreliable in older perls $test =~ m{ \A (?| $rx_days) \b (?{ push @got, $^N }) (*FAIL) }xms; # dd \@got; # for debug is_deeply \@got, [ qw( Monday Mon mo mon mOnDaY mO Tue tu tue TuE we wed wEd wednes Thursday Th th thu ThUrS fr fri Saturday Saturday Sat sa sat satu Sun su sun sUn )], 'extraction sorted by day-order'; sub day_parts { my ($ar_parts, # substrings to assemble to nested regex ) = @_; # starts like [ 'tu', 'es', 'day' ] # intermediate ( 'tu', 'e', 's', 'day' ) # ends like tu (?: e (?: s (?: day)?)?)? my $start = $ar_parts->[0]; my @parts = (split('', $ar_parts->[1]), $ar_parts->[2]); return "$start @{[ _day_parts(@parts) ]}"; } sub _day_parts { return @_ ? "(?: @{[ shift, _day_parts(@_) ]})?" : () +; }

False-positive day-name matches can be entirely avoided with a custom regex:

my $rx_days = qr{ (?i) # custom patterns .*? \K \b mo (?: n (?: day)?)? | .*? \K \b tu (?: e (?: s (?: day)?)?)? | .*? \K \b we (?: d (?: nes day)?)? | .*? \K \b th (?: u (?: r (?: s (?: day)?)?)?)? | .*? \K \b fr (?: i (?: day)?)? | .*? \K \b sa (?: t (?: ur day)?)? | .*? \K \b su (?: n (?: day)?)? }xms;
My somewhat arbitrary use of \K in this regex means the extraction regex must be changed. This works:
    $test =~ m{ \A $rx_days \b (?{ push @got, ${^MATCH} }) (*FAIL) }xmsp;
Use of the ${^MATCH} match variable (and /p modifier) can be avoided by use of the
    substr $test, $-[0], $+[0]-$-[0]
expression.


Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11145768]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (3)
As of 2024-04-24 15:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found