Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

A question of 'or' in a Regex

by c (Hermit)
on May 28, 2003 at 14:46 UTC ( [id://261317]=perlquestion: print w/replies, xml ) Need Help??

c has asked for the wisdom of the Perl Monks concerning the following question:

My regex is as follows:

/C(?:800)|(?:(?:35|29)(?:(?:50)|(?:00XL)))/

Concerning this snippet of code, I have two questions...

1. Is this using too many paren statements, or is this just per coder's discretion?

2. This regex matches on 'C3550XL' and I'm not sure why. I would expect C3500XL or C3550, but not this problematic mix of the two. My thought is that the statement:

(?:(?:50)|(?:00XL))

seperates the 50 and the XL portion ensuring a match only on strings with XL that are directly preceeded by two zeros.

Thanks in advance for you guidance.

-c

Replies are listed 'Best First'.
Re: A question of 'or' in a Regex
by broquaint (Abbot) on May 28, 2003 at 14:55 UTC
    1. Is this using too many paren statements, or is this just per coder's discretion?
    I'd say it's too many groupings but it's similar to using parens in code to distinguish expressions, explicit and potentially confusing. Personally I'd simplify it to /C (?:800 | (?:35 | 29 )(?:50 | 00XL))/x. Also the regex doesn't do quite what you think - it matches C800 or (?:(?:35|29)(?:50|00XL)) (hence the slightly modified regex I presented).
    2. This regex matches on 'C3550XL' and I'm not sure why
    It matches for the same reason that ABC3550DEF will match - the regex isn't anchored. So what you'll want is something like /\AC(?:800|(?:(?:35|29)(?:(?:50)|(?:00XL))))\z/ to get the expected match. See perlre for more info on \A and \z.
    HTH

    _________
    broquaint

      You parsed it incorrectly. (I parsed broquaint's regex incorrectly). By just adding whitespace I get the following:

      / C (?: 800 ) | (?: (?: 35 | 29 ) (?: (?: 50 ) | (?: 00XL ) ) ) /x

      Which shows that some (?:) were completely useless. This imposes zero runtime overhead so it only really matters because it makes it harder for the next programmer to read. In removing the useless (?:) groups I get.

      / C800 | (?: 35 | 29 ) (?: 50 | 00XL ) /x
Re: A question of 'or' in a Regex
by BrowserUk (Patriarch) on May 28, 2003 at 15:58 UTC

    If your strings are stand-alone, then using anchors (\A & \Z (or ^ & $ unless your using the /m option)) will work fine, but if your trying to detect matches for this regex within the context of a larger string, then you can acheive this using a few zero-width assertions.

    #! perl -slw use strict; while( <DATA> ) { chomp; print if m[ \bC # Starting with C preceeded by a non-\ +w char (?: # and either 800 # 800 | # or (?: 29 | 35 ) # 29 or 35 (?: # and (?: 50(?!XL) ) # 50 not followed by XL | # or (?: 00XL ) # 00XL ) )\b # followed by a word break ]x; } __DATA__ This is the spec for the ABC800. This device is a This is the spec for the C800. This device is a This is the spec for the C2900. This device is a This is the spec for the C2950. This device is a This is the spec for the C2900XL. This device is a This is the spec for the C2950XL. This device is a This is the spec for the C3500. This device is a This is the spec for the C3550. This device is a This is the spec for the C3500XL. This device is a This is the spec for the C3500XLM. This device is a This is the spec for the C3550XL. This device is a __OUTPUT__ D:\Perl\test>junk This is the spec for the C800. This device is a This is the spec for the C2950. This device is a This is the spec for the C2900XL. This device is a This is the spec for the C3550. This device is a This is the spec for the C3500XL. This device is a

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://261317]
Approved by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-24 01:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found