Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re: Common Regex Gotchas -- "(:?"

by shenme (Priest)
on Sep 29, 2005 at 18:28 UTC ( [id://496199] : note . print w/replies, xml ) Need Help??

in reply to Common Regex Gotchas

Summary: "(:?" is a quiet guy, but not as well-mannered and quick as that "(?:" fellow.

When extending the regex syntax to include features like zero-width negative look-ahead the authors tried very hard to use syntax that avoided duplicating any 'real' regex code. So they started all the new syntax with '(?'. It turns out that this makes typos a bit too easy, and far too quiet.

I came across the following in a CPAN module:

It isn't important what the RE does as much as 1) it doesn't work as intended, and 2) it doesn't (loudly) fail

The writer intended to use "(?:", the clustering grouping. This is used when you need to avoid capturing the matched subexpression. For instance you might want to say that a complex inner match is optional, e.g.

... ( contains \s+ (?:this|that)? \s+ item ) ...

But tyops happen. What is the result if you reverse the ':' and '?' characters? Nothing drastic, usually.

In "(:? pattern )" the original meaning of '?' is used - the ':' character becomes an optionally matched character. The parentheses also revert to their original meaning of capturing groups.

So usually the only result is that the regex is a bit slower and captures more substrings. It might also allow a stray ':' input character. If you weren't monitoring how many captures come back from a successful match you might never notice the typo.

But note that this typo could occur with any single character "(?X" syntax. You might notice it right away if your "(#? comment )" caused syntax errors. And you should notice it when your input matching tests fail on "fore(=?fend)". But otherwise these typos will silently fail.

Now this is a minor gotcha. Except that it is found in 15 nodes here, with another node mentioning it in an aside, and another node discovering the typo in a book. I wonder if it is in your code?

perlre - Extended Patterns