Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Regexes Are Not For String Comparison

by japhy (Canon)
on Apr 24, 2001 at 18:40 UTC ( [id://75068]=perlmeditation: print w/replies, xml ) Need Help??

Never again do I want to see code like:
if ($foo =~ /^$bar$/) { ... } if ($foo =~ /^$bar$/i) { ... } if ($foo =~ /\A$bar\Z/i) { ... } if ($foo =~ /\A$bar\z/i) { ... }
If you were to really want to use a regex, you'd use
/^\Q$bar\E\z/i
But you wouldn't even then, because you're not that silly. Use string comparison functions where they should be used. Regexes are for patterns -- =~ is the "pattern-match binding operator".

Sigh. And this, coming from Mr. I Love Regexes. It must be serious.

if (lc($foo) eq lc($bar)) { ... }


japhy -- Perl and Regex Hacker

Replies are listed 'Best First'.
Re: Regexes Are Not For String Comparison
by frag (Hermit) on Apr 24, 2001 at 19:56 UTC

    I've done this sort of thing lots of times before myself (using \Q\E or quotemeta(), at least). Reading your post, I've been trying to figure out why, and I've reached this conclusion: When your most impressive tools are regexes, everything looks like a pattern.

    In the big cognitive precedence tables stored in most Perl users'(*) wetware, m// ranks above lc(). So writing this kind of code is understandable. Plus, lc() is usually presented (or else, can get pigeonholed after learning about it) as a tool for changing a string, so someone trying this sort of comparison might think "I need to compare two things, not change the identity of one". So using lc() with eq might not even come up for consideration.

    I'm just meditating on the reasons why we pick certain coding forms. We all know TMTOWTDI, but how do people come to choose their ways?

    -- Frag.

    (*)IMHO, this isn't just me.

      But I find that the simpler tools are rarely used, while the more complex ones are used oft and poorly. Many people see regexes, and never understand them; but that does not stop them from using them. And the more regexes get used by those who don't understand them, the more they seem to be mysterious giants.

      I see people that put /(.*)/ in their code for no reason. It's just waiting for data to come along and break it.

      japhy -- Perl and Regex Hacker

        I'm not disagreeing (I think), but I'm trying to get at why people reach for the more complex ones. Using m// isn't by itself complex and is usually one of the first Way Cool Perl features taught to newbies, and people can easily think that they understand the basic regex components (like $1 or /i or $^ or .*) that they've read about or seen used in tutorials. If you're saying that people use it just to make themselves look 'leet, I don't want to discount that, but it's more than just that -- there are reasons for thinking and choosing certain things, that has to do with the way the language is structured, or in how the language is learned.

        -- Frag.

        P.S. I'm confused on the problem with /(.*)/ -- do you mean /"(.*)"/? How can copying an entire string into $1 break? (Although it is certainly dumb.)

(dws)Re: Regexes Are Not For String Comparison
by dws (Chancellor) on Apr 24, 2001 at 21:00 UTC
    <counteropinion>

    Unless performance is a concern, go with what's most readable.

    For a lone string test, use lc() and eq. But if you've already used a couple of regex's to test $foo, it may be more readable to throw in another one, rather than switching gears and using lc() and string compare.

    Consider adding a test below this fragment:

    next if $foo =~ /^#/; if ( $foo =~ /^(?:this|that|or|something|else)$/ { ... }
    Now which reads better   if ( $foo =~ /^$bar$/i ) { ... } or   if ( lc($foo) eq lc($bar) ) { ... } I'll argue that it's at best a toss-up. As a reader of fragments like this, I see the pattern of a variable being tested against a sequence of regex's, and my brian goes into regex scanning mode. Mixing in a string comparision might be technically correct, but it breaks up the reading flow, at least in examples like this.

    </counteropinion>

Re: Regexes Are Not For String Comparison
by buckaduck (Chaplain) on Apr 25, 2001 at 02:04 UTC
    I will timidly confess to a preference for the regex solution when comparing to $_, such as:
    foreach (@names) { next if /^buckaduck$/i; ... }
    Because I have a real distaste for writing $_ explicitly whenever it's not absolutely necessary. This is partly for readability -- some of the Perl novices at my work are still afraid of $_ ... (And yet they don't mind the implicit use of $_. Strange.) And it's also just plain easier to type than the more proper alternative:
    foreach (@names) { next if ( lc($_) eq lc('buckaduck') ); ... }

    buckaduck

Re: Regexes Are Not For String Comparison
by princepawn (Parson) on Apr 24, 2001 at 23:45 UTC
    uhm, japhy, unless I am mistaken this node is a rehash of your node which is a perlmonks best nodes : Code Smarter.

    But the more times people that see your wisdom re-distilled and re-stated, they may actually allow it to seep into their skulls which are only 1/10th as thick as mine.

      Yes, it is a repetition of a main point in that node, but I felt I had to bring it to light again, since there were some nodes about such misuse (in my opinion) of regexes.

      japhy -- Perl and Regex Hacker
Re: Regexes Are Not For String Comparison
by DeusVult (Scribe) on Apr 25, 2001 at 23:51 UTC
    Regexes are for patterns -- =~ is the "pattern-match binding operator".

    I have to mildly disagree. I agree that /^$bar$/ is a dumb regex, but often string literal regexes without either the ^ or the $ are a perfectly reasonable idea.

    Now it may be the sort of scripts I've been writing recently, but I almost never find an occasion to use the "eq" operator. And it isn't just that I want case insensitive matches. It's most often whitespace or added junk characters. If I'm trying to match the string "foobar" I don't really care if it is actually "foobar ". Or if I'm looking for the string "no such file or directory" I don't care if I hit "no such file or directory at line 26 of foobar.pl". So I'll use /^no such file or directory/i without a second thought.

    So I'm not really sure how strict you were being in your definition of patterns (personally, when I read the word "pattern" I think of masses of '\'ed characters), but I often find regexes a nice fit for matching plain old strings with a bit of leeway. I don't know if I'm interpreting you as being more draconian than you intended, but there is a place for regexes in certain types of string comparison.

    But /^$bar$/ really is dumb. Although my personal favorite for stupid regexes is /.*$bar.*/

    If you have any trouble sounding condescending, find a Unix user to show you how it's done.
    - Scott Adams

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://75068]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (7)
As of 2024-04-19 09:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found