Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^3: Please evaluate: RegEx for validating e-mail addresses

by davido (Cardinal)
on Sep 25, 2004 at 17:39 UTC ( #393816=note: print w/replies, xml ) Need Help??


in reply to Re^2: Please evaluate: RegEx for validating e-mail addresses
in thread Please evaluate: RegEx for validating e-mail addresses

The point that Ovid was making is that if you wish to use a regular expression to validate an email address's format, you've got to make a choice: Either use that whopping big regexp that he provided, or risk mistakenly rejecting valid addresses. If you don't mind rejecting valid addresses, go with a less intricate regular expression, but it won't be a robust solution.


Dave

  • Comment on Re^3: Please evaluate: RegEx for validating e-mail addresses

Replies are listed 'Best First'.
Re^4: Please evaluate: RegEx for validating e-mail addresses
by Limbic~Region (Chancellor) on Sep 25, 2004 at 19:08 UTC
    davido,
    While you and Ovid are both correct, context is important. Just because all of those email addresses are valid WRT the RFC, they likely are not valid for a given MTA. It is also conceivable probable that an MTA accepts email addresses as valid that are categorized as invalid by the RFC. Having spent a couple of years managing an enterprise email gateway, I have seen what happens when two MTAs can't agree on what is and is not valid.

    My whole point being is that - right or wrong, a simple form of a regex can work for validating email addresses when what governs valid is local environment and not the global RFC.

    Cheers - L~R

      L-R, I think you're confusing "stuff that microsoft produces" with actual Mail Transfer Agents.

      All internet MTAs that I'm aware of would have no problems understanding and delivering those addresses. Sendmail, qmail, postfix, exim, smail, qsmtpd, etc etc.

      I will stand firm on chastising anyone that narrows "valid email" down to their own limited view of email. Follow the RFC, or don't play at all. If you want to "embrace and extend", please do that in the privacy of your own cubicle, not out in public.

      I will also stand firm on chastising those (such as you, L-R) who support such narrow views. Please don't do that. It makes my job harder as well.

      -- Randal L. Schwartz, Perl hacker
      Be sure to read my standard disclaimer if this is a reply.

        merlyn,
        Unfortunately I am no longer with the U.S. Dept. of Justice to give specific examples, but the list of all the internet MTAs you mentioned was in my experience quite limited. The gateway delt with many hundreds of thousands of emails a day. The gambit gamut of mail systems that they communicated with was quite large to include, but certainly not limited to, CC:Mail, (old) GroupWise, and your beloved Microsoft Exchange. Additionally, there are other prehistoric mail systems out there, home grown ones, etc that I assure you are not RFC compliant.

        When the problem was with a modern system, it was almost always do to administrative modification to prevent anti-impersonation and open-relay. Since I can't give specific examples of MTAs that do not accept RFC compliant email addresses, http://www.rfc-ignorant.org/ is a site dedicated to listing domains that don't think the rules of the internet apply to them (postmaster, DSN, etc).

        Finally, it is arguable that I am using MTA (Message Transfer Agent) too liberally. For instance in the case of CC:Mail and GroupWise, the internal mail system was not native SMTP and so the gateway/MTA had to additionally translate the incoming address to the native format. Or perhaps if the MTA is only accepting mail for a specific domain which happens to be a specific mail system - than it is not a true MTA. I will not argue except to say my original point was referring to specific situations. Admittedly, I should have been clear:

        • You are only validating "to" addresses for "incoming" emails
        • You are validating requests for new email accounts on your specific mail system

        Now that the context is known, the possibilities I presented are pointless.

        Cheers - L~R

        Update: Word correction thanks to Albannach

      I do understand the point you're making; that sometimes there is a "good enough" solution, especially where a full RFC implementation seems like overkill in the context of a more closed environment.

      But you mentioned that you've seen what happens when two MTA's can't agree on what is and is not valid. I imagine that can be quite a problem. So why make things worse by supporting the building of solutions that fail to meet the RFC? If a particular network is going to send and receive email over the internet, or if a webpage is going to be validating email addresses that are to be valid on the internet, a mostly-correct solution is going to be somewhat-incorrect. I know that CGI Programming with Perl suggests a mostly-correct solution for matching email addresses. But there exists (in the CPAN modules) solutions that are fully correct. If writing ones' own fully-compliant solution is too much work, there's always those modules.

      The OP suggested that he couldn't use modules because this ultimately will be ported to another language. ...well at least learn from the modules out there how to go about the task. Or invoke Perl from that other language's code to perform the test. ...or ask the question in a PHP-oriented group, where undoubtedly it has been answered before. ;)


      Dave

        Dave,
        If a particular network is going to send and receive email over the internet...

        DaWolf didn't indicate what purpose the email validator was going to be used for. That is why I said "context is important". I should likely have said without knowing the context, it is impossible to say how close to the RFC you need to get. I gave two examples in my reply to merlyn of where local context would be a factor, but I admit that it is just as probable that the form was to validate the "from" address of a form mailer and should be full RFC compliant.

        Now that the context is known, the possibilities I presented are pointless.

        Cheers - L~R

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://393816]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (2)
As of 2022-05-29 03:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (101 votes). Check out past polls.

    Notices?