Perl Regular Expressions - Do's and Dont's

Introduction

Many of the more experienced users may think of this article like, "look how cute, he's stating the obvious."
That is fine. This article is written with regular expression newbies in mind. I often see stupid mistakes and every time again someone has to say, "why do you use /g when you do not want to search globally?" Or, "why are you matching case insensitive (/i) on a [A-Za-z] character class?"

I want to collect as many as possible mistakes like this and put them in this article, so that we have something to refer to when another stupid regex mistake is made.
Note: this is not a regex tutorial nor regex howto. Neither will I explain how to do The Obvious. If you have any questions regarding regexen after reading this article, RTFM, read the books, read the tutorials, use the Super Search, use Google.

Also know that this article is not there to tell you what you should do and shouldn't do. It represents my view on things - whether or not that view is based on views of others. When possible, I will also explain why I have the particular view and give examples and counterexamples and benchmarks.

One more thing before I go off. If you ever find this article in another place than perlmonks.[org|net|com], most of the links will not work (unless the copier has done his job right) for Perl Regular Expressions - Do's and Dont's was especially written with Perl Monks in mind.

Jargon

Before I finally start off, let's set some terminology.
For example, it might be useful to know that "regular expression(s)", "regexp(s)", "regex(es/en)", and "RE('s)" all refer to the same thing: regular expression(s).

Seven Rules of Thumb

This are the most important do's and don'ts to keep in mind when regexing.

1. Do use strict; and do use warnings;. Do use diagnostics; when you don't understand error messages. Do use these pragma's just because it is a good habit.: I think you should know why.
2. Do use the -T switch (taint mode; for a simple overview, check the Perl Security (perlsec) manpage) in a responsible manner when input from external sources may be unsafe.: When a script runs in taint mode, values from external sources (user input, environment variable values (and therefor CGI HEAD or POST values), input from files, etc) are considered 'tainted'. This means you can't do certain things with it, like eval. Also, things like unlink, chmod and the likes are prohibited on tainted data. This is very useful, because possibly unsafe data will not harm the system. Whenever you pass a tainted value to another variable, it is still considered tainted: not the variable, but the value is tainted. There are several ways to untaint data, which I am not about to mention here. You should check the above mentioned Perl Security (perlsec) manpage.
The easiest way to make a perl CGI script run in taint mode, is by making this the first line:
#!usr/bin/perl -T
The path to perl on your system may vary, but you get the idea. Optionally, you can also add the -w switch to enable warnings.
Note: some people say data from external sources is always unsafe. Personally, I don't agree, but it's worth mentioning.
3. Don't trust users, or the programs under their control. Some are ignorant, some are malicious. It's also possible people suffer from "Fat fingeritis" (© schodckwm) and accidentaly enter something evil :).: Some users don't know how a system is meant to be used and use it wrong. They might walk through security holes in your script you might not know about. So, make your scripts fool proof. Others might test the limits of your script and damage your system. This may or may not happen as a collateral damage. Keep them away.
It is a wise idea to check if users entered the correct value. For example, when you've written a simple menu where the user should choose a number from 1 to 5, inclusive, you should check if they really entered a valid number (however this one is pretty obvious). Another example: say you have a website. Users are allowed to choose a nickname with a minimum of 3 and a maximum of 16 characters. Of course you could easily set a max length property on the input field. But a more experienced user is able to save the HTML file to disk and change the value of the max length property, thereby allowing a much longer nickname. Don't trust users.
4. Do know what you want to achieve. • Do know how to achieve that. • • That is, do understand regular expressions.: If you want to paint something black, you will want to buy black paint. If you want to match a string for only lower case letters, you will want not to use the /i modifier.
5. Don't use regexes for formats without a definite syntax, like human language.: Regexes are good for pattern matching (and substitution), not for language analysis.
Note that regexes can be used to find, for example #include statements in a C++ file, but then realise regexes are not the right tool to parse a C++ file. Consider the use of Parse::RecDescent instead.
6. Do comment your code: When you re-read it, you will have forgotten what your code should do. Others won't know anyway. Also, see point four in the next chapter.
7. Do use CGI; (have a look at CPAN) when writing CGI scripts. • that is, don't invent the wheel. • • that is, do use modules.: The CGI modules offers you many functions to handle CGI data. The most important, in my view, is the param() funtion, which allows you to get HTTP parameter values, like filled in form fields. You could of course easily write functions like this yourself but most probably, your function won't be as good as the one of the CGI module.
This is the case with many modules. You can do it, but it will cost you too much time to do it as well as the module's author. And besides, why would you want to reinvent the wheel?

As you will see, you won't need the rest of this article if you apply these seven rules correctly.

What does this mean?

A general list with some more rules to keep in mind. These rules actually ensue from the Seven Rules of Thumb.

1. Don't use the /g (global matching) modifier when you don't want to search global (that is, through the entire string).

Global matching searches the entire string, where non-global matching searches until it finds what it's looking for (it will only reach the end of the string when it doesn't find the search pattern). Besides it is faster not to use /g when you don't need it, you won't use a hammer when you want to drive a screw into something, would you? Of course it is possible, but it is a little exaggerated.

To prove my point, here's a benchmark. I'll run two regexes 1,000,000 times: one of them without global search, the other one with global search.

#!/usr/local/bin/perl
use strict;
use warnings;
use Benchmark qw(:all);


cmpthese(1_000_000, {
    "Non-global" => sub { "Perl is cool" =~ m/Perl/ },
    "Global" => sub { "Perl is cool" =~ m/Perl/g }
    }
);
[download]

This is the result:

                Rate     Global Non-global
Global     1351351/s         --        -9%
Non-global 1492537/s        10%         --
[download]

As you can see, the non-global search runs about 1,492,357 times in a second, whereas the global search runs only 1,351,351 times in a second.
You might think, "so what? Both of them are that fast, you won't even notice the differece." True, but note that this is only a very simple regular expression and note that the test script is only very small. As you might find out one day, your regexes will grow more and more complicated, as do your scripts. And why would you want to give your CPU a hard time?

2. Don't use the /i (case insensitive matching) modifier when case does matter. Don't use the /i modifier with regexes like m/[A-Za-z]/i.

For example, when there is a difference between "perl" (the executable) and "Perl" (the language), and you want to search a large file for all references to the language ("Perl)', don't check with m/Perl/i. You won't know whether you will find only the references to the executable or also to the language.
Do use the /i modifier when you don't know what you can expect. For example, don't expect all users to know the difference between "perl", "Perl", and "PERL" (which is wrong but many users seem not to know that). In that case, case insensitive checking is a good idea.
The regex m/[A-Za-z]/ already matches upper- and lowercase letters. Using a /i modifier here will only make your meaning unclear. "I am called MUBA, but you may also call me MUBA." Or: "I am called both uppercase and lowercase A-Z, but you may also call me both uppercase and lowercase A-Z."

3. Do know that different people may write the same thing in different manners.

For example (and I will just ignore the last Don't from the next chapter): a zipcode like "1234 AB" may be written like "1234AB" or "1234 ab" or "1234ab" by others.
Don't throw an error message when "1234 AB" is written like "1234ab". Do just match case insensitive (unless "1234 AB" actually is another zipcode than "1234 ab") and do match the space zero or more times (\s*).

4. Don't make regexes too complex. Although regexes are powerful, it might be easier to read and maintain them when you split up your single regex into multiple regexes. • Or do use the /x modifier and give your regex a nice lay-out. • • Do use different lines (but lined out equally) for search patterns and replace expressions when using s/// and tr/// or y///.

s/
    (                                           # Match and backrefere
+nce to either
        (?:                     
            Perl|                          # one or
            perl|                                 # another way
            PERL|                                 # to spell
            [Pp][Ee][Rr][Ll]                      # perl
        )|                                      # or
        (?:                
            Java|                                 # one or
            java|                                 # another way
            JAVA|                                 # to spell
            [Jj][Aa][Vv][Aa]                      # java
        )
    )                                 
/        # and replace it with
    lc($1) eq "perl" ?
        "$1 looks like a nice language to me"   # this when perl is fo
+und
    :lc($1) eq "java"?
        "I don't know $1 very well"             # or this when java is
+ found
    :"I do not wish to consider $1"             # or this when somethi
+ng else is found.
/ex;
[download]

(Yes, I know it is redunant to first check all common possibilities (Perl, perl, PERL) and then check all possibilities, but I had to make it complex, right? Besides, I am not perfect either. And note: this regexp really sucks. Read on to the see demerphq's comment on it.)
This way, you can easily see what you are looking for, how things are related to each other and how replacement is done.

5. Do know what your regex really means. • Do know about ^, $, variable interpolation, the matching rules as described by the Camel Book, modifiers, and the meaning of \n (newline) in combination with ., ^ and $

In other words, RTFM. Get a driving license before driving a car. Know about politics before standing for president. Know where you want to get before entering an airplane. Check if you are entering the right airplane before you enter it.

6. Do know about precedence.

Do know the difference between m/^a|b$/ and m/^(?:a|b)$/ (or m/^(a|b)$/ for short).
Do know the difference between m/(ab)/ and m/(?:ab)/. m/^a|b$/ will match either an "a" at the beginning of the string, or a "b" at the end, or nothing at all if both are absent. This is right if this is what you want. But say you want to match either an "a" or a "b", which should be the first and last letter of the string, you should use m/^(?:a|b)$/. The ?: makes sure no backreference (\1, \2, ..., $1, $2, ...) is made.
If, however, you want a string to begin with an "a" or a "b", followed by some (0..n) letters and end with the character it started with, use: m/^(a|b)[A-Za-z]*\1$/.

7. Do know what data to expect.

Besides Timtoady (TIMTOWTDI (for the uninitiated: There Is More Than One Way To Do It. This actually is the slogan of Perl)), 'regexpect' is another nice keyword to remember: regular expression - expectation. Always keep in mind what data you expect while creating or modifying.

8. Don't use modifiers for tr/// or y/// which are only useful for m// and/or s///. Also don't use modifiers for m// which are only useful for s///.

In other words, do know about modifiers. Do understand them. Do know how to use them. Do know the difference between tr///, y///, m// and s///. In other words, Do RTFM.

9. Don't untaint tainted values with a statement like ($untainted) = $tainted =~ m/(.*)/;.

I see little reason in doing so. There might be a situation, though, that this is ok, for example if you're absolutely sure the data comes from a trusted source.
But in most cases, it's like pumping air in your bicycle tire, then driving straight through shattered glass. It would be a better idea to avoid the glass and drive around it; it would be better to be more careful. Well, the same story goes for taint mode. First, you take the effort to be wise and use taint mode. Then, you just let insecurity slip into your program (unless you trust the source).
It's better to untaint data you've checked. For example, when you expect a two character string starting with a vowel, make sure that is the thing you untaint: m/^[aeiou].$/i. Now, malicious data is less able to slip in.

Validating addresses

1. Don't use regexes to validate e-mail addresses.: You can do so, only if you know the current specification very well and then only if you update your regex whenever the specification changes. Otherwise, users will try to enter their e-mail address (which is perfectly valid according to the current specification but not according to your regex) and will receive an error message.
There are regexes out there that do a fine job validating an e-mail address. They take quite some lines of code and even then they are not able to be entirely correct.
2. Don't use regexes to validate HTTP, FTP or other type of internet addresses.: You can do so, only if you know the current specification very well and then only if you update your regex whenever the specification changes. Otherwise, users will try to enter a home page address (which is perfectly valid according to the current specification but not according to your regex) and will receive an error message.
3. Don't use regexes to validate zipcodes when you accept international users.: Zip code syntax may vary in different countries.

Conclusion: because regexes are so powerful, people are tempted to use them to validate all types of addresses. Don't. You won't know the current specification of the particular address.

Language checking

1. Don't use regexes to validate HTML, XHTML, XML and the likes.: Others have already proven regexes are not up to the job.
Use one of the many HTML modules out there.
2. Don't use regexes to extract information from this type of files.: There are nice modules available that can do it for you, even better than you. Use them.

Credits

I think a word of thanks is in place here. For the warm welcomed help they gave me and the time they spent on reading and commenting this article, I would like to thank the people who gave me useful advice, and critical comments. Inspiration came from: users that made stupid unnecessary mistakes (yes, I am one of them).
I received comments from Gumpu, Dietz, sporty, BrowserUk and bart.
After that, I also got useful replies from nobull, demerphq and schodckwm.

"2b"||!"2b";$$_="the question"

Comment on Regexp do's and don'ts Select or Download Code

Replies are listed 'Best First'.
Re: [Try-out] Regexp do's and don'ts by Dietz (Curate) on Aug 15, 2004 at 09:56 UTC
`5. Do know what your regex really means.<br> � Do know about ^, $, variable interpolation, the matching rules as de +scribed by the Camel Book, modifiers, and the meaning of \n (newline) + in combination with ., ^ and $` [download] Do know about precedence since disregarding it is one of the highest crime in regex country: I've made a sample common mistake not paying attention to precedence and I use this as a chance to pillory myself giving a perfect example of what not to do. In node Re: Short or Long Hand I was using an alternation being based on anchors: `/^0\|6$/` This simply says match 0 at the beginning or match 6 at the end while my intention was to match 0 or 6 ranging from beginning to the end of the string: `/^(?:0\|6)$/` As an addition to your tutorial I'd like to see the basic requirement for regexes: � You can't write an efficient regex as long as you don't know what your expected data will be: Always think of the expected data while changing or simplifying regexes. I personally like the term 'regexpected' Keep it in mind and it will save your life ;-) Please feel free to downvote node Re: Short or Long Hand /me castigating myself for not paying attention to precedence	[reply] [d/l] [select]
Re: [Try-out] Regexp do's and don'ts by gumpu (Friar) on Aug 15, 2004 at 09:37 UTC
Hoi, Good stuff! Has the potential be very useful for newbies. One point: "5. Don't use regexes for formats without a definite syntaxis, like human language. Regexes are good for pattern matching (and substitution), not for langague analyzing." Think you have to be a bit more specific here. Computer languages have a definte syntax but using regular expressions to parse them can be a very painful process. (I Know this from experience cause I once tried to make a code beautifier for C++ and Pascal). In those cases a proper parser (say Parse::RecDescent) is much better. If you are just trying to find simple things in source code, for instance `#include` statements or simply formatted comment blocks, regular expressions would be fine. Have Fun	[reply]
Re^2: [Try-out] Regexp do's and don'ts by Anonymous Monk on Sep 26, 2004 at 15:29 UTC
as long as you don't define better as faster, yes, it is much better :D	[reply]
Re: [Try-out] Regexp do's and don'ts by exussum0 (Vicar) on Aug 15, 2004 at 14:37 UTC
Do give examples for any point you ever make when reading these types of documents. In some ways, you are giving advice, but if you give reasons with solid examples and counter examples, you make it that much stronger. Points 1 and 2 is easy. Using /i is an efficiency thing. Show benchmarks. If the benchmarks show no difference, then the point isn't valid. Point 7 ticked me off at a particular company, where a few people who would do just that. Show a good example, maybe with a system call or file handle that shows how this, as an exploit, would work. It's the difference between "don't smoke" and "don't smoke, it increases your chances of cancer" ---- Then B.I. said, "Hov' remind yourself nobody built like you, you designed yourself"	[reply]
Re: [Try-out] Regexp do's and don'ts by BrowserUk (Patriarch) on Aug 15, 2004 at 15:49 UTC
In my (somewhat devalued) opinion, there is only one DO... and one DONT... worthy of note. DON'T tell others what they should or should not do. DO explain to others why you choose to do (or not) certain things (in particular ways). For bonus points, also explain why you (or others) might choose use the proscribed behaviour (or not use the advised behaviour) under some circumstances. Make particular emphasis upon explaining the deciding factors that would sway your decision against your norm. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon	[reply]
Re: [Try-out] Regexp do's and don'ts by demerphq (Chancellor) on Mar 28, 2005 at 10:55 UTC
You should recommend people avoid constructs like: `[Jj][Aa][Vv][Aa]` as they are quite inefficient and also can blow out various optimizations just by their presence. Its better to write that (?i:Java). Also up until 5.9.2 perl doesnt optimise alternations very well so its advisable to use modules like Regexp::List or the like to preprocess `/Lists\|of\|words/` [download] . OTOH as of 5.9.2 perl _does_ optimize them so using things like Regexp::List will only slow down your patterns (im hopeful by 5.10 these modules will be updated to Do The Right Thing Regardless™). In fact if at all possible after that version it is recommended that you use alternations instead of using quantifier, bracketing. Ie, `/(cars\|cart\|carry\|car)/` [download] will be more efficent that `/(car([st]\|ry)?)/` [download] as of 5.9.2, and in some circumstance massively more efficient. I admit i wrote the optimization so im tooting my own horn here a bit. :-) But it is worth realizing that alternations in later perls can be signifigantly faster than other hypothetically equivelent patterns. --- demerphq	[reply] [d/l] [select]
Re^2: [Try-out] Regexp do's and don'ts by muba (Priest) on Mar 28, 2005 at 11:24 UTC
I am fully aware of the fact that `m/[Jj][Aa][Vv][Aa]/` sucks like hell. I just needed a "complex" regex which had a clear goal, in order to demonstrate multi-lining regexes. But I'll add a note: don't try this at home :) `"2b"\|\|!"2b";$$_="the question"` Besides that, my code is untested unless stated otherwise. One more: please review the article about regular expressions (do's and don'ts) I'm working on.	[reply] [d/l] [select]
Re: [Try-out] Regexp do's and don'ts by muba (Priest) on Aug 15, 2004 at 11:06 UTC
Ah! Two wonderful reactions already so far! Gumpu, yes, you're definitly right about that one. The next version will have it updated. Dietz, 'Regexpected'... nice word! That is indeed what I ment to say but I will clarify it and use that word ;) And precedence... yes, I will also mention that in the next version. Thank you so far. Any more comments? I'd like to hear them! `"2b"\|\|!"2b";$$_="the question"`	[reply] [d/l]
Re: [Try-out] Regexp do's and don'ts by ikegami (Patriarch) on Sep 25, 2004 at 06:28 UTC
Good work! I found two important missing items: Do use `(?:...)` when `(...)` isn't necessary. Do avoid $`, `$&` and `$'`, explaining why, and showing alternatives for use with and without /g.	[reply] [d/l] [select]
Re: [Try-out] Regexp do's and don'ts by nobull (Friar) on Mar 27, 2005 at 22:13 UTC
Good document. I can hardly fault it technically. A few suggestions (I am a native speaker of (British) English): "look how cute, he's telling the obvious." "look how cute, he's stating the obvious." to search global to search globally The plural of 'regex' is sometimes written as 'regexen'. This is not a standard way of making a plural in English but it still seen quite often. The wisdom of using strictures and taint mode has little to do with regexes. The whole section "Rules of Thumb" should be introduced as general Perl programming advice not RE specific advice. Perhaps even split this section off to a separate node. tainted date tainted data they are/behave malicious Can't make this work comfortably in English. They are malicious or they behave maliciously. Don't trust users. ...or programs under their control. getting president standing for president Check if you enter the right airplane before entering it. Check if you are entering the right airplane before you enter it. if both are not present. Sorry the precedence of 'not' here is ambiguous in written English. (In spoken English it would be possible to disambiguate with intonation). if neither are present. unless both are present. if both are absent. `($untainted) = $tainted =~ m/(.)/g;` `($untainted) = $tainted =~ m/(.)/;` (And you just finished warning people not to use redundant qualifiers) :-) This way, you only show you don't know why one would use taint mode and you make taint mode useles for your script. This could be considered insulting. There's nothing wrong with unconditionally untainting data that is known for certain to come from a trusted source.	[reply] [d/l] [select]
Re^2: [Try-out] Regexp do's and don'ts by muba (Priest) on Mar 28, 2005 at 11:21 UTC
Good document. I can hardly fault it technically. A few suggestions (I am a native speaker of (British) English): Alright, thank you! I edited the Original Post in order to use most of your suggestions. `($untainted) = $tainted =~ m/(.)/g;` `($untainted) = $tainted =~ m/(.)/g;` (And you just finished warning people not to use redundant qualifiers) :-) Oh, I had a really hard time finding out what the difference between the two lines of code is. But indeed. I altered it, so now it is `($untainted) = $tainted =~ m/(.*)/;` :) Well, thanks! `"2b"\|\|!"2b";$$_="the question"` Besides that, my code is untested unless stated otherwise. One more: please review the article about regular expressions (do's and don'ts) I'm working on.	[reply] [d/l] [select]
Re: [Try-out] Regexp do's and don'ts by muba (Priest) on Aug 16, 2004 at 11:24 UTC
original content (version 0.2, 0.2.1) deleted and moved to root node. Old version 0.1 deleted from thread at all. Sorry for the polution.	[reply]
Re^2: [Try-out] Regexp do's and don'ts by Sameet (Beadle) on Aug 16, 2004 at 13:45 UTC
Thanks MUBA, I was actually searching something on Perl's Regular Expressions. This node has been God (read "Perl")sent to me. Regards Sameet	[reply]
Re: [Try-out] Regexp do's and don'ts by ww (Archbishop) on Sep 24, 2004 at 19:38 UTC
one reader's take: you have the makings of a very good regex article here... my quibbles: the editorial matter (re strict, warnings, etc) before you get to regex issues might well be split off -- perhaps multiple splits, as others have suggested. Seems to me that the title would be annoyingly misleading, otherwise. As for your use of English -- you have used some constructs that vary from the forms used by those whose first language is "American English" but almost none (see first list item below) of them obstruct understanding or present any serious obstacle to "ease of reading." coupla' specifics: (updates, 20050328 and new language suggestions below (waaay! far down)) Has been addressed: ~~para 4: "lined out equally" might be more colloquially phrased "aligned vertically." (problem is that "lined out" is synonymous with "struck out.")~~ Suggest you expand para 8 with some examples, explanations. Advising one to "RTFM" is sometimes all well and good, but in an article with a tutorial intent, it seems to me to verge on rudeness to the reader (and lest that be a mystery, because (1) it's a contraction for a phrase which uses a word offensive to some and (2, more important) TFM is extremely dense and sometimes -- depending on the reader's learning style -- difficult to absorb. You might consider offering links several choices of explanatory material including but not limited to Owl, Friedl, TFM and others (Yes, I'm one such and would love to find the "other" that sings to me.) repeated below~~"syntaxis" ??? Unfamiliar, not found in a quick dictionary check. Suspect you intend "syntax."~~ Suggested addition: brief discussion of the ephemeral character of $1 ... (ie, reset issues) $1 is mentioned in para 6. Hope this is some help. Please, drive on with your good work! ++	[reply]
Re: [Try-out] Regexp do's and don'ts by muba (Priest) on Aug 15, 2004 at 21:42 UTC
Two more wonderful notes. Sporty, you're right. I will add more ex(ample\|planation) in the next version. I also think this will increase the pleasure people will find when reading the article. BrowserUk, yes, I agree with you. It's better not to tell what others SHOULD do or SHOULD NOT do, but that makes a less interesting text in my opinion. But, as I already said, I will add examples in the next version. Thank you for the useful comments and critics, and for the time you spent to help me. I still welcome more comments. Especially, I want to remention my wish to fill in the module and link gaps. I do know I can't expect you to do my work (for it is my work) but I would really like to see that someone finds the modules and pages I'd like to refer to. Within a couple of weeks, I will be able to do that myself but for now I am stuck with a expensive dial-up connection (the phone company charges the phone ticks :( ). Furthermore, I'd like to mention your names in a "word of thanks" chapter. By default, I assume you don't have problems with that but if you do, please let me know. `"2b"\|\|!"2b";$$_="the question"`	[reply] [d/l]
Re: [Try-out] Regexp do's and don'ts by ww (Archbishop) on Mar 28, 2005 at 13:58 UTC
muba: good stuff. You may wish to consider the following (mostly minor and occasionally open to debate) re the idiom or syntax: In Introduction, "Note: this is not a regex tutorial or regex howto." (emphasis supplied) s/or /nor/ likewise, s/If you may ever find /If you ever find / (~~may~~) Jargon: "Before I finnaly start off, let's set some terminology." -- for spelling change to "finally"; for idiom: just omit it entirely. Rules of Thumb 2: I'm intruding into content here, but I'm troubled by the statement, "when input from external sources may be unsafe." My view: input from external sources is ALWAYS unsafe... even if it's coming from me. No malice is required: "Fat fingeritis" can wreak havoc! RoT 2: "...etc) is considered 'tainted'." s/is/are/ for subject-verb agreement in quantity; typo: "Also, thinks like" s/thinks/things/. also in RoT 2, for brevity: "There are several ways to untaint data, which I am not about to mention here. You should check the above mentioned Perl Security (perlsec) manpage." could be written, "There are several ways to untaint data for which you should check the above-mentioned...." RoT3: "They are ignorant or else they are malicious." would be less globally applicable to (all) users) if you said, "Some are ignorant; some are malicious." (As written, the current phrase indicts ALL users.) and "...number from 1 to 5, including, you..." s/including/inclusive/ "On the other hand," means (in this useage) that what follows is intended as a counter-example, whereas what actually follows is a supporting or additional example. Suggest one way to improve it would be to drop the quoted phrase, or (and the grammar stiffs will be object to this, replace "OTOH" with "Or" typo: s/easiliy/easily/ RoT 5. "syntaxis" -- I think you want "syntax" and "analysis" instead of "analyzing." RoT 6. spelling: s/shuld/should/ RoT 7. "Do use CGI; (have..." might be clearing if you were to say "Do use CGI:; (have..." or, even better, if you specified the module by its full name and: "CGI offers you a great amount of functions" can be better phrased "CGI offers you many functions" and in "...as good as the module's author." s/good/well/ (good is an adjective, well is the adverb form). If you find these useful (msg me), I'll carry on with the rest of the document. Again, ++	[reply]
Re^2: [Try-out] Regexp do's and don'ts by muba (Priest) on Mar 28, 2005 at 21:11 UTC
Thank you! I used all but one of your suggestions to improve this document. Most things you pointed out where stupid mistakes (which I of course fixed), others were things I'd never find out. Again, thank you! `"2b"\|\|!"2b";$$_="the question"` Besides that, my code is untested unless stated otherwise. One more: please review the article about regular expressions (do's and don'ts) I'm working on.	[reply] [d/l]
Regexp Legibility by patrickhaller (Initiate) on Jun 14, 2009 at 05:21 UTC
I used to (before use strict) compose regexps by hiding the components inside an if, e.g.: `$complex_re = /^($ip) ($host) ($msg)$/ if ( $ip = /\d+\.\d+\.\d+\.\d+/, $host = /[\-\.\w]+/, $msg = /./ );` [download] Nowadays, I use strict and eval, e.g. `my $complex_re = eval { my $ip = qr/\d+\.\d+\.\d+\.\d+/; my $host = qr/[\-\.\w]+/; my $msg = qr/./; return /^($ip) ($host) ($msg)$/; );` [download] We pay about a 5% penalty for the eval when we use this in a tight loop, however we can solve that by moving the regexp creation outside the loop. `Rate with eval without eval with eval 116279/s -- -5% without eval 121951/s 5% --` [download] Patrick	[reply] [d/l] [select]
Re: Regexp Legibility by ambrus (Abbot) on Jun 14, 2009 at 14:44 UTC
I think you should use `do {` instead of `eval {` there.	[reply] [d/l] [select]
Re^2: Regexp Legibility by patrickhaller (Initiate) on Jun 21, 2009 at 16:33 UTC
Looks like eval runs faster... `Rate as usual with do as usual 63091/s -- -29% with do 88496/s 40% -- Rate with eval as usual with eval 59773/s -- -9% as usual 65359/s 9% --` [download]	[reply] [d/l]
Re^3: Regexp Legibility by linuxer (Curate) on Jun 21, 2009 at 16:47 UTC
Re^3: Regexp Legibility by Anonymous Monk on Jun 21, 2009 at 16:45 UTC


"be consistent"
	PerlMonks

Regexp do's and don'ts