|Pathologically Eclectic Rubbish Lister
Regex Misuseby srawls (Friar)
|on May 14, 2001 at 04:34 UTC
I know most of you already know this, but I have seen a lot of misuse of regexes, so I thought I'd write this for begginers to read.
Many people get 'regex happy,' and use them when one could use a much faster function. Here's an example:
This is some advice a notable perl monk gave someone asking about how to implement fixed-width columns for data files. The goal was to extract $width amount of characters from a variable. More experienced programmers are shaking their heads right now, they know it would be much more efficient to use this:
You see, regexes are a very powerful tool, but they are not fast (well, relatively speaking). It is much faster to say, "take this many bytes from this variable, starting at this position in the string," than it is to say, "take the input and see if "." matches the next character, and than repeat that for this number of times." Also, with the regex, it has to be compiled, and then used, all of which is done in polynomial time (read: not fast).
Another common thing is to try to match the whole string, when you only need to match part of it. Here are some examples:
The first example (actually taken from this site) can be improved by taking the first and last (.*) out; we don't need to match the beginning and end, that's not what we are switching around (the regex is used to swap string_1 and string_2). The second regex (just an example I made up now) is being implemented to see if a file ends in .txt. It is extremely wasteful, we only need to match the .txt part, not the whole string. The improved regexes are below:
There are probablly other common mistakes, but I'm just writing this on things in my head now, that I have recently seen, so if some one else wants to post a common mistake in a reply to this, to try to help the beginners, I would very much appreciate that.