Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

G'day AnomalousMonk,

With regard to the POSIX character class, ++hippo has already pointed out the problem with that. You can certainly be forgiven for that because the documentation appears to be wrong. From "perlrecharclass: POSIX Character Classes":

Perl recognizes the following POSIX character classes:

...

2. alnum Any alphanumeric character ("[A-Za-z0-9]").

I rarely use the POSIX classes and wasn't aware of that discrepancy. Anyway, while possibly "easier on the eye", that's likely to result in a fair amount of frustration for someone attempting to perform debugging and assuming the documentation is correct.

The problem could be further exacerbated when input characters may not appear to be ones that should be failing. While hippo's example using "LATIN SMALL LETTER C WITH CEDILLA" (ç) was fairly obvious, the glyphs for some characters (depending on the font) may be identical or so similar that it's difficult to tell them apart. Consider "LATIN CAPITAL LETTER A" (A) and "GREEK CAPITAL LETTER ALPHA" (Α):

$ perl -C -E '
    use utf8;
    say "$_ (", ord $_, "): ", /\A[A-Za-z0-9]\z/ ? "✓" : "✗"
        for qw{A Α}
'
A (65): ✓
Α (913): ✗
$ perl -C -E '
    use utf8;
    say "$_ (", ord $_, "): ", /\A[[:alnum:]]\z/ ? "✓" : "✗"
        for qw{A Α}
'
A (65): ✓
Α (913): ✓

As far as the 'x' modifier goes, I don't disagree that it can improve readability; however, where it's felt necessary to use it — either because the regex is particularly complex or it's code that junior developers will need to deal with — spreading the regex across multiple lines and including comments might be even better:

my $re = qr{ \A # Assert start of string [A-Za-z0-9] # Must start with one of these (?: # Followed by either [A-Za-z0-9_.-]*? # Zero or more of these [A-Za-z0-9] # But ending with one of these | # OR # Nothing ) \z # Assert end of string }x;

And, with 5.26 or later, perhaps even clearer as:

my $re = qr{ \A # Assert start of string [A-Z a-z 0-9] # Must start with one of these (?: # Followed by either [A-Z a-z 0-9 _ . -]*? # Zero or more of these [A-Z a-z 0-9] # But ending with one of these | # OR # Nothing ) \z # Assert end of string }xx;

We've already had exhaustive discussions about the 'm' and 's' modifiers. Use them if you want to follow PBP suggestions but understand that they do absolutely nothing here: there's no '^' or '$' assertions that 'm' might affect; there's no '.' (outside a bracketed character class) that 's' might affect.

— Ken


In reply to Re^3: Regex to detect file name by kcott
in thread Regex to detect file name by lirc201

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (2)
As of 2024-04-25 01:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found