Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Pattern Matching

by AnomalousMonk (Archbishop)
on Mar 17, 2017 at 03:49 UTC ( [id://1184952]=note: print w/replies, xml ) Need Help??


in reply to Pattern Matching

huck has shown why your regex (correctly!) matches a single space. Here's the approach I would take to a solution. (Note that I am sure there are CPAN modules to do all this much better!) Pay particular attention to adding test cases to the no-match section of tests. I would also add some mixed-case tests to the all-match section.

c:\@Work\Perl\monks>perl -wMstrict -le "use Test::More 'no_plan'; use Test::NoWarnings; ;; my $rx_1_3 = qr{ (?i) i{1,3} }xms; my $rx_1_9 = qr{ (?i) (?: $rx_1_3 | iv | v $rx_1_3? | ix) }xms; my $rx_1_39 = qr{ (?i) (?: $rx_1_9 | x{1,3} $rx_1_9?) }xms; ;; my $pat = qr{ [(]? \b $rx_1_39 \b (?: [.] | [)][.]?) [ ] }xms; ;; use constant ROMAN_1_39 => qw( i ii iii iv v vi vii viii ix x xi xii xiii xiv xv xvi xvii xviii xix xx xxi xxii xxiii xxiv xxv xxvi xxvii xxviii xxix xxx xxxi xxxii xxxiii xxxiv xxxv xxxvi xxxvii xxxviii xxxix ); ;; note 'perl version: ', $]; ;; my $test_regex = qr{ \A $pat \z }xms; note 'test regex: ', $test_regex; ;; note 'ALL must match'; for my $roman (ROMAN_1_39, map uc, ROMAN_1_39) { for my $pre ('', '(') { for my $post (qw/. ) )./) { my $rs = qq{$pre$roman$post }; ok $rs =~ $test_regex, qq{'$rs'}; } } } ;; note 'NONE shall pass!'; for my $nomatch (ROMAN_1_39, ' ', qw(iiii ixxxix xxxixi ixxxixi etc), ) { ok $nomatch !~ $test_regex, qq{'$nomatch'}; } ;; done_testing; "
(I won't pollute these sacred spaces with the rather tedious output.)


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: Pattern Matching
by huck (Prior) on Mar 17, 2017 at 04:10 UTC

    I liked [.], ill have to remember that one!

      IIRC, this is from TheDamian's regex PBPs, which I religiously observe (the others, not so much). Things like  [(] [.] [ ] are visually useful — especially [ ], because what the heck does  \ mean anyway in  /x context, which we ought always to use?


      Give a man a fish:  <%-{-{-{-<

        I recently decided to plunk

        use re '/x';

        in all my new code to make it the default. It takes a while to get used to the free space around tokens -- it's like antigolfing your regexes.

Re^2: Pattern Matching
by davidas (Initiate) on Mar 17, 2017 at 21:41 UTC

    Thanks. I'll have a good play about with this. I didn't go the CPAN module route because of the requirement to match leading and trailing brackets, period and spaces, which I thought would be more specific to my particular requirement - ironically that's not what caused the problem though !

      ... the requirement to match leading and trailing brackets, period and spaces ...

      I had in mind using a CPAN module only as a source for a regex for dependably recognizing the Roman-numeric part of your string, something along the lines of what Regexp::Common provides. Unfortunately, this module does not seem to support Roman numerals.

      Ok, then maybe use the Roman-to-decimal conversion functions of Roman or Text::Roman (but I've not used either of these modules and so can't recommend them) or some such to test for the 1 .. 39 range of a Roman sequence extracted with a simple  [ivxIVX]+ capture. The advantage of using such a module is that it is, one presumes, well-tested. (These modules both provide an  isroman() function that would, one would hope, reject something like ixixixix, but I haven't checked this.)

      But if you have to do all that, maybe it's better to hand-craft (and test!) your own  [i-xxxix] regex...


      Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1184952]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-25 07:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found