Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Regex (lookahead) Confusion

by bart (Canon)
on Feb 05, 2004 at 20:57 UTC ( [id://326877]=note: print w/replies, xml ) Need Help??


in reply to Regex (lookahead) Confusion

I think you're thinking of something like this:
/^(?:([smtwhfa])(?!.*\1))*$/

Replies are listed 'Best First'.
Re: Re: Regex (lookahead) Confusion
by allolex (Curate) on Feb 05, 2004 at 21:13 UTC

    You are *so* the monk!

    The regex works because if the variable ength of the space (0-full line) before matching the backreference. And lookaheads can be variable length.

    This is essentially the converse approach to what I was thinking trying to solve the problem. Someday...

    while (<DATA>) { chomp; if ( /^(?:([smtwhfa])(?!.*\1))*$/ ) { print "$_ : OK\n"; } else { print "$_ : Not OK\n"; } } __DATA__ swma smqa smsa fhtm ttma t2ms __END__ swma : OK smqa : Not OK smsa : Not OK fhtm : OK ttma : Not OK t2ms : Not OK

    --
    Allolex

      So let's see if I understand this and please correct me if I'm wrong.

      The ?: means that the parens are just for grouping and will load any matches into $1, $2, etc.
      Then the character class in parens
      and then the lookahead...
      ?! means that it is a negative lookahead
      .* means any character 0 or more times (very greedy)
      \1 is related to the character matched from the character class
      It's the .* that is throwing me off. To me, that looks like it would match a single character repeated any number of times but not separated duplicate characters. I just don't understand it ... yet. I will keep looking.

        s/will load/will not load/, right?

        The .* is saying there can be any number of characters in the string between the first match (the character class) and the backreference (\1) match, from zero (two characters next to each other) on up...

        Say out first match was 'a', then 'a' is also referred to in \1. Then we have a.*a: aa,  a.a,  a..a,  a...a,  a....a, etc. And its a negative lookahead, so if any one of those combinations matches, the regex will fail.

        --
        Allolex

Re: Re: Regex (lookahead) Confusion
by ChrisR (Hermit) on Feb 05, 2004 at 21:09 UTC
    Bart, you're a genius!! That works perfectly. Now I'll just have to figure out how and why it works.
      perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(q/^(?: +([smtwhfa])(?!.*\1))*$/)->explain' The regular expression: (?-imsx:^(?:([smtwhfa])(?!.*\1))*$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- (?: group, but do not capture (0 or more times (matching the most amount possible)): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- [smtwhfa] any character of: 's', 'm', 't', 'w', 'h', 'f', 'a' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- (?! look ahead to see if there is not: ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \1 what was matched by capture \1 ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- )* end of grouping ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
        Wow, I wish I had known about YAPE::Regex::Explain a long time ago. This will provide a much greater understanding of regexes and faster troubleshooting. All hail japhy!!
Re: Re: Regex (lookahead) Confusion
by japhy (Canon) on Feb 05, 2004 at 21:53 UTC
    I think that's less efficient than: /^(?=[smtwhfa]*$)(?!.*(.).*\1)/ but I'm too busy right now to find out.
    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
      Again, another genius here. The regex you supplied also works perfectly. As far as efficiency goes, I'm not sure it will matter much in this situation. I am interested though in how you would go about testing speed and efficiency for something like this. I'm sure it's not as simple ( and unreliable) as loading up 1000's of strings and timing execution with a watch.

      YAPE::Regex is cool man!
Re: Re: Regex (lookahead) Confusion
by QM (Parson) on Feb 05, 2004 at 23:12 UTC
    Bart's regex does the trick. I was thinking of a lazy approach using sort - slower, but easier for the regex neophytes to get a handle on:
    #!/your/perl/here use warnings; use strict; while (<DATA>) { chomp; my $sort = join '', sort split ''; ( ( $sort =~ /^[afhmstw]+$/ ) # only these chars and ( $sort !~ /([afhmstw])\1/ ) ) # no repeats ? print "<$_> OK\n" : print "<$_> Not OK\n"; } __DATA__ smsa smta stmwhas BADsmtaEXAMPLE __END__ <smsa> Not OK <smta> OK <stmwhas> Not OK <> Not OK <BADsmtaEXAMPLE> Not OK

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://326877]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2024-04-25 12:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found