Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

RegExp help

by heatblazer (Scribe)
on Mar 26, 2012 at 13:55 UTC ( [id://961669]=perlquestion: print w/replies, xml ) Need Help??

heatblazer has asked for the wisdom of the Perl Monks concerning the following question:

Hello again, monks. This time I need some help about Regular Expressions. Lame of me, but I still can`t get the 'source' of it... I require a simple and small explanation about them with my few regex needs here is some: Validate me a mail but it should be aware of:
*johnsmith@yahoo.com
hello12@.com
hello@gmail..com
etc. Please, write me a good example with appropriate comments to see how these regexes are working since I just can`t get it.

Well, thanks for replying I am getting the hang of it reading some books, here is one of my tests about simple html tag checking:

#!/usr/bin/perl use warnings; use strict; while (<DATA>) { if ( /<hr( +size *=? *[0-9]+ *(\/>|>)| +\/>|>)/i ) { print; } } __DATA__ <hr size==12 /> <hr size = 12 />#!/usr/bin/perl use warnings; use strict; while (<DATA>) { if ( /<hr( +size *=? *[0-9]+ *(\/>|>)| +\/>|>)/i ) { print; } } __DATA__ <hr size==12 /> <hr size = 12 /> <hr size=14> <hr> <hr > <hr /> <hr size /> <hr size=14> <hr> <hr > <hr /> <hr size />

This is a fair good match I`ve made for horizontal lines tags with regexes :)

Replies are listed 'Best First'.
Re: RegExp help
by dorko (Prior) on Mar 26, 2012 at 14:17 UTC
    If you're not good at RegEx, I wouldn't suggest trying to parse email addresses with them. It's harder than it looks and you can get yourself into trouble using regular expressions like that.

    Instead, try something like Email::Valid. From the docs for Email::Valid:

    use Email::Valid; print (Email::Valid->address('me@example.com') ? 'yes' : 'no');

    Cheers,

    Brent

    -- Yeah, I'm a Delt.

      Thank you, but I`ll try learning regex because I just want to know it. Will keep in mind this module you suggested.

Re: RegExp help
by stevieb (Canon) on Mar 26, 2012 at 14:32 UTC

      Thank you, I am reading these right now.

Re: RegExp help
by Anonymous Monk on Mar 26, 2012 at 14:15 UTC
Re: RegExp help
by JavaFan (Canon) on Mar 26, 2012 at 15:47 UTC
    You do know that *johnsmith@yahoo.com is actually valid syntax, don't you?

    Here's a regexp:

    $pat = qr { (?(DEFINE) (?<address> (?&mailbox) | (?&group)) (?<mailbox> (?&name_addr) | (?&addr_spec)) (?<name_addr> (?&display_name)? (?&angle_addr)) (?<angle_addr> (?&CFWS)? < (?&addr_spec) > (?&CFWS)?) (?<group> (?&display_name) : (?:(?&mailbox_list) | (?& +CFWS))? ; (?&CFWS)?) (?<display_name> (?&phrase)) (?<mailbox_list> (?&mailbox) (?: , (?&mailbox))*) (?<addr_spec> (?&local_part) \@ (?&domain)) (?<local_part> (?&dot_atom) | (?&quoted_string)) (?<domain> (?&dot_atom) | (?&domain_literal)) (?<domain_literal> (?&CFWS)? \[ (?: (?&FWS)? (?&dcontent))* (?& +FWS)? \] (?&CFWS)?) (?<dcontent> (?&dtext) | (?&quoted_pair)) (?<dtext> (?&NO_WS_CTL) | [\x21-\x5a\x5e-\x7e]) (?<atext> (?&ALPHA) | (?&DIGIT) | [-!#\$%&'*+/=?^_`{|} +~]) (?<atom> (?&CFWS)? (?&atext)+ (?&CFWS)?) (?<dot_atom> (?&CFWS)? (?&dot_atom_text) (?&CFWS)?) (?<dot_atom_text> (?&atext)+ (?: \. (?&atext)+)*) (?<text> [\x01-\x09\x0b\x0c\x0e-\x7f]) (?<quoted_pair> \\ (?&text)) (?<qtext> (?&NO_WS_CTL) | [\x21\x23-\x5b\x5d-\x7e]) (?<qcontent> (?&qtext) | (?&quoted_pair)) (?<quoted_string> (?&CFWS)? (?&DQUOTE) (?:(?&FWS)? (?&qcontent +))* (?&FWS)? (?&DQUOTE) (?&CFWS)?) (?<word> (?&atom) | (?&quoted_string)) (?<phrase> (?&word)+) # Folding white space (?<FWS> (?: (?&WSP)* (?&CRLF))? (?&WSP)+) (?<ctext> (?&NO_WS_CTL) | [\x21-\x27\x2a-\x5b\x5d-\x7e +]) (?<ccontent> (?&ctext) | (?&quoted_pair) | (?&comment)) (?<comment> \( (?: (?&FWS)? (?&ccontent))* (?&FWS)? \) ) (?<CFWS> (?: (?&FWS)? (?&comment))* (?: (?:(?&FWS)? (?&comment)) | (?&FWS))) # No whitespace control (?<NO_WS_CTL> [\x01-\x08\x0b\x0c\x0e-\x1f\x7f]) (?<ALPHA> [A-Za-z]) (?<DIGIT> [0-9]) (?<CRLF> \x0d \x0a) (?<DQUOTE> ") (?<WSP> [\x20\x09]) ) (?&address) }x; while (<DATA>) { chomp; use 5.010; say $_ if /^$pat$/; } __DATA__ *johnsmith@yahoo.com hello12@.com hello@gmail..com
    This will print *johnsmith@yahoo.com as this is the only entry that's syntactically correct.

      Thanks for the awesome example, however it`s too much for me to understand it yet.

        There's little I can offer to make it more understandable: the syntax of email addresses is complex. Just be glad that we're living in a post-5.10 world: now we can use rules and recursion which allows us to, almost mechanically, translate BNF grammars to regular expressions. In one (both?) of the editions of "Mastering Regular Expressions", Jeffrey Friedl gives a pre-5.10 regular expression to match email addresses. That one is far, far more complex (and doesn't allow nested comments below a certain depth (2, IIRC)).

        You may want to look at RFC 822, or one of its descendants, for the grammar of email addresses. It's my understanding, the regexp I gave was constructed based on the grammar given in one of the RFCs. (I don't recall which one, and the file t/re/reg_email.t doesn't say where it comes from).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://961669]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2024-04-25 22:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found