http://qs321.pair.com?node_id=944253

varghees has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to find a regex that will verify that a string has all 5 vowels in it a,e,i,o,u. I don't care what order they're in or if they appear twice.

Replies are listed 'Best First'.
Re: regex to find vowels in anyorder
by BrowserUk (Patriarch) on Dec 19, 2011 at 15:45 UTC

    @words = do{ local @ARGV = 'words.txt'; <> }; chomp @words;; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and print for @words;; aboideau aboideaus aboideaux aboiteau aboiteaus aboiteaux absolutive absolutize absolutized absolutizes abstemious abstemiously abstemiousness ...

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      You can anchor that regex to the start of the string with ^ and have it fail faster; in practice I haven't been able to measure a difference, so either perl is clever enough, or the whole thing is determined by IO performance.

        Even excluding IO, I cannot discern any appreciable difference either. @words contains 178,000 words:

        [0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and + ++$c for @words; printf "Found $c matches in %.5f seconds\n", (time( +)-$t);; Found 1905 matches in 0.14288 seconds [0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and + ++$c for @words; printf "Found $c matches in %.5f seconds\n", (time()-$t);; Found 1905 matches in 0.12593 seconds [0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and + ++$c for @words; printf "Found $c matches in %.5f seconds\n", (time()-$t);; Found 1905 matches in 0.13659 seconds [0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and + ++$c for @words; printf "Found $c matches in %.5f seconds\n", (time()-$t);; Found 1905 matches in 0.14437 seconds [0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and + ++$c for @words; printf "Found $c matches in %.5f seconds\n", (time()-$t);; Found 1905 matches in 0.13993 seconds [0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and + ++$c for @words; printf "Found $c matches in %.5f seconds\n", (time()-$t);; Found 1905 matches in 0.13856 seconds [0] Perl> $c =0; $t = time; m[^(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] an +d ++$c for @words; printf "Found $c matches in %.5f seconds\n", (time()-$t);; Found 1905 matches in 0.13786 seconds [0] Perl> $c =0; $t = time; m[^(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] an +d ++$c for @words; printf "Found $c matches in %.5f seconds\n", (time()-$t);; Found 1905 matches in 0.13947 seconds [0] Perl> $c =0; $t = time; m[^(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] an +d ++$c for @words; printf "Found $c matches in %.5f seconds\n", (time()-$t);; Found 1905 matches in 0.12269 seconds [0] Perl> $c =0; $t = time; m[^(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] an +d ++$c for @words; printf "Found $c matches in %.5f seconds\n", (time()-$t);; Found 1905 matches in 0.13944 seconds [0] Perl> $c =0; $t = time; m[(?=^.*a)(?=^.*e)(?=^.*i)(?=^.*o)(?=^.*u) +] and ++$c for @words; printf "Found $c matches in %.5f seconds\n", (time()-$t);; Found 1905 matches in 0.12400 seconds [0] Perl> $c =0; $t = time; m[(?=^.*a)(?=^.*e)(?=^.*i)(?=^.*o)(?=^.*u) +] and ++$c for @words; printf "Found $c matches in %.5f seconds\n", (time()-$t);; Found 1905 matches in 0.14011 seconds [0] Perl> $c =0; $t = time; m[(?=^.*a)(?=^.*e)(?=^.*i)(?=^.*o)(?=^.*u) +] and ++$c for @words; printf "Found $c matches in %.5f seconds\n", (time()-$t);; Found 1905 matches in 0.13754 seconds [0] Perl> $c =0; $t = time; m[(?=^.*a)(?=^.*e)(?=^.*i)(?=^.*o)(?=^.*u) +] and ++$c for @words; printf "Found $c matches in %.5f seconds\n", (time()-$t);; Found 1905 matches in 0.13191 seconds

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re: regex to find vowels in anyorder
by SuicideJunkie (Vicar) on Dec 19, 2011 at 15:04 UTC

    Smells like homework.

    The best way is likely to be KISS:

    if (/a/ and /e/ and /i/ and /o/ and /u/ and (rand()>0.5 or /y/)) { ... }

      I wouldn't do this as an "anded if" but in a loop with an array of vowels. As soon as the script has found the first vowel the script would continue with the next vowel but if the vowel wasn't found it would stop the whole search as it's not interesting whether the other vowels are in the string.

      The performance of my solution depends on the distribution of the strings. Statistically I'm too weak to tell you what is faster. The longer your strings are the worse is my solution. If your search contains a lot of strings and there are some with a length less than number of vowels you could skip the whole test as in a less-than-five(six)-letter-string there is evidently no chance for five(six) different vowels.

        I wouldn't do this as an "anded if" but in a loop with an array of vowels. As soon as the script has found the first vowel the script would continue with the next vowel but if the vowel wasn't found it would stop the whole search as it's not interesting whether the other vowels are in the string.
        Uhm, in which way does an "anded if" continue with a next vowel after a previous vowel wasn't found?

        As JavaFan and ww are discussing, your post seems to imply that you think the anded if does not short circuit when the first vowel is not found. That is not the case.

        A loop to put each vowel into regexes sequentially would have the advantage of not hardcoding the vowels, though it would do at least as much work.

      Not a homework. but this question raised when we were discussing about regex between perl developers... I am looking for one expression to find it.
        So, what code did the discussion between you Perl developers produce? What advantages are there to it? What disadvantages? If you are developers, you will surely have written code. If none of you developers has an idea, it surprises me that none of you developers has a question about perlre.

        Perhaps the 'Look-Around Assertions' section of perlre might be helpful. See specifically (?=pattern). I would think that a set of five of these would work.

        --MidLifeXis

Re: regex to find vowels in anyorder
by toolic (Bishop) on Dec 19, 2011 at 16:34 UTC
Re: regex to find vowels in anyorder (obfuscated)
by eyepopslikeamosquito (Archbishop) on Dec 19, 2011 at 21:01 UTC

    This one can be easily solved without using a regex. For example, a one-liner featuring the good ol' tr (aka y) operator, punctuated with just & characters (sorry couldn't resist):

    perl -ne 'y&a&&&&y&e&&&&y&i&&&&y&o&&&&y&u&&&&print' words.txt

Re: regex to find vowels in anyorder
by AnomalousMonk (Archbishop) on Dec 19, 2011 at 19:31 UTC
    ... all 5 vowels in it a,e,i,o,u.

    Just as a cautionary side-note, the character set  [a,e,i,o,u] (which is what I assume was originally posted without awareness of the effect of square brackets) includes ',' (comma) as a vowel! Please see Markup in the Monastery and Writeup Formatting Tips.

      I don't think the OP was posting a character set -- he's just listing what he considers vowels.
Re: regex to find vowels in anyorder
by Not_a_Number (Prior) on Dec 19, 2011 at 20:20 UTC
    I'm trying to find a regex that will...

    You don't need no steenkin' regex:

    open my $fh, '<', 'whatever'; # or whatever my %hash = ( a => 0, e => 1, i => 2, o => 3, u => 4 ); while ( <$fh> ) { chomp; my $copy = $_; my @array; while ( my $char = lc chop $copy ) { no warnings 'uninitialized'; $array[ $hash{ $char } ] = 1; } say if ( grep $_, @array ) == 5; }

    Pretty fast, too, you'll find...

    Update: Hmm, if you take out the line no warnings 'uninitialized'; (and therefore remove use warnings or whatever from the start of the code), it seems to run nearly 20% faster still...

    Comments welcome.

    Update 2: Oops, just realised that my code doesn't work! Change the contents of the inner while loop to:

    $array[ $hash{ $char } ] = 1 if defined $hash{ $char };

    Which is what I had originally, before playing with no warnings 'uninitialized';. And then I didn't test properly.

    Mea culpa.

Re: regex to find vowels in anyorder
by kennethk (Abbot) on Dec 19, 2011 at 15:28 UTC
    Since you have 5 independent, overlapping searches you wish to perform simultaneously, you necessarily need something that won't consume letters on matching (assuming you don't want to use embedded code to cache results in a hash). Variable width match without consuming == look-ahead, so I'd start looking there. Looking ahead and looking behind.
Re: regex to find vowels in anyorder
by pvaldes (Chaplain) on Dec 19, 2011 at 15:46 UTC

    Nested notation and uppercase

    if (/a/i){ if(/e/i){ if(/i/i){ if(/o/i){ if(/u/i){ print "we found the five vowels"; } } } } }
      I think BrowserUk's solution is going to be faster. But past that I would not use nested if's when you mean "and" or "&&". Get rid of 4 unnecessary levels of indentation.
        I think BrowserUk's solution is going to be faster.
        That would not be my guess, unless it's the /i that's killing the performance. /a/ will not use the regexp engine, the optimizer will do it. If speed is an issue, and you want to be case sensitive, my bet would go to:
        if ((/a/ || /A/) && (/e/ || /E/) && (/i/ || /I/) && (/o/ || /O/) && (/ +u/ || /U/)) { ... }
        but I'm too lazy to come up with a good benchmark (which should test for both match and non-match). And if the query set would be English words, I'd order the vowels from least frequently occurring to most frequently (probably u-i-o-a-e, but I'd have to look that up), in order to fail faster.

        Of course, it's also very likely speed doesn't matter at all.

A reply falls below the community's threshold of quality. You may see it by logging in.