regex to find vowels in anyorder

Perl 6 - second systems done right

by moritz (Cardinal) on Dec 19, 2011 at 16:22 UTC

You can anchor that regex to the start of the string with ^ and have it fail faster; in practice I haven't been able to measure a difference, so either perl is clever enough, or the whole thing is determined by IO performance.

The start of some sanity?

by BrowserUk (Patriarch) on Dec 19, 2011 at 16:50 UTC

Even excluding IO, I cannot discern any appreciable difference either. @words contains 178,000 words:

[0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and
+ ++$c for @words; printf "Found $c matches in %.5f seconds\n", (time(
+)-$t);;
Found 1905 matches in 0.14288 seconds
[0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and
+ ++$c for @words; 
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.12593 seconds
[0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and
+ ++$c for @words; 
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.13659 seconds
[0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and
+ ++$c for @words; 
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.14437 seconds
[0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and
+ ++$c for @words; 
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.13993 seconds
[0] Perl> $c =0; $t = time; m[(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] and
+ ++$c for @words; 
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.13856 seconds

[0] Perl> $c =0; $t = time; m[^(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] an
+d ++$c for @words; 
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.13786 seconds
[0] Perl> $c =0; $t = time; m[^(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] an
+d ++$c for @words; 
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.13947 seconds
[0] Perl> $c =0; $t = time; m[^(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] an
+d ++$c for @words; 
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.12269 seconds
[0] Perl> $c =0; $t = time; m[^(?=.*a)(?=.*e)(?=.*i)(?=.*o)(?=.*u)] an
+d ++$c for @words; 
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.13944 seconds

[0] Perl> $c =0; $t = time; m[(?=^.*a)(?=^.*e)(?=^.*i)(?=^.*o)(?=^.*u)
+] and ++$c for @words; 
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.12400 seconds
[0] Perl> $c =0; $t = time; m[(?=^.*a)(?=^.*e)(?=^.*i)(?=^.*o)(?=^.*u)
+] and ++$c for @words; 
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.14011 seconds
[0] Perl> $c =0; $t = time; m[(?=^.*a)(?=^.*e)(?=^.*i)(?=^.*o)(?=^.*u)
+] and ++$c for @words; 
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.13754 seconds
[0] Perl> $c =0; $t = time; m[(?=^.*a)(?=^.*e)(?=^.*i)(?=^.*o)(?=^.*u)
+] and ++$c for @words; 
printf "Found $c matches in %.5f seconds\n", (time()-$t);;
Found 1905 matches in 0.13191 seconds
[download]

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Re: regex to find vowels in anyorder
by SuicideJunkie (Vicar) on Dec 19, 2011 at 15:04 UTC

Smells like homework.

The best way is likely to be KISS:

if (/a/ and /e/ and /i/ and /o/ and /u/ and (rand()>0.5 or /y/))
{
...
}
[download]

by leuchuk (Novice) on Dec 19, 2011 at 16:55 UTC

I wouldn't do this as an "anded if" but in a loop with an array of vowels. As soon as the script has found the first vowel the script would continue with the next vowel but if the vowel wasn't found it would stop the whole search as it's not interesting whether the other vowels are in the string.

The performance of my solution depends on the distribution of the strings. Statistically I'm too weak to tell you what is faster. The longer your strings are the worse is my solution. If your search contains a lot of strings and there are some with a length less than number of vowels you could skip the whole test as in a less-than-five(six)-letter-string there is evidently no chance for five(six) different vowels.

by JavaFan (Canon) on Dec 19, 2011 at 17:58 UTC

I wouldn't do this as an "anded if" but in a loop with an array of vowels. As soon as the script has found the first vowel the script would continue with the next vowel but if the vowel wasn't found it would stop the whole search as it's not interesting whether the other vowels are in the string.

Re^4: regex to find vowels in anyorder

by ww (Archbishop) on Dec 19, 2011 at 21:06 UTC

Re^5: regex to find vowels in anyorder

by JavaFan (Canon) on Dec 20, 2011 at 01:28 UTC

Some notes below your chosen depth have not been shown here

Re^5: regex to find vowels in anyorder

by JavaFan (Canon) on Dec 19, 2011 at 23:31 UTC

by SuicideJunkie (Vicar) on Dec 20, 2011 at 00:17 UTC

As JavaFan and ww are discussing, your post seems to imply that you think the anded if does not short circuit when the first vowel is not found. That is not the case.

A loop to put each vowel into regexes sequentially would have the advantage of not hardcoding the vowels, though it would do at least as much work.

by varghees (Novice) on Dec 19, 2011 at 15:10 UTC

Not a homework. but this question raised when we were discussing about regex between perl developers... I am looking for one expression to find it.

by Corion (Patriarch) on Dec 19, 2011 at 15:23 UTC

perlre

by MidLifeXis (Monsignor) on Dec 19, 2011 at 15:22 UTC

Perhaps the 'Look-Around Assertions' section of perlre might be helpful. See specifically (?=pattern). I would think that a set of five of these would work.

--MidLifeXis

Super Search where title contains "vowel"

Re: regex to find vowels in anyorder
by toolic (Bishop) on Dec 19, 2011 at 16:34 UTC

2011-12-19 varghees regex to find vowels in anyorder SoPW

2010-04-01 vennila count words which contain all vowels in a file. SoPW

2009-03-19 paragkalra Finding vowels SoPW

2007-08-16 m0ve vowel mutations SoPW

2007-01-03 tiny_tim Get Vowels from sentence SoPW

2004-02-10 NodeReaper (DUPLICATE) REGEX to match all vowels SoPW

2004-02-10 D'Oh!! regex testing for ALL of the vowels in a scalar SoPW

2003-06-06 tall_man Finding vowels in a cryptogram SoPW

2001-08-16 chainsawed vowel_search Snippet

Re: regex to find vowels in anyorder (obfuscated)
by eyepopslikeamosquito (Archbishop) on Dec 19, 2011 at 21:01 UTC

This one can be easily solved without using a regex. For example, a one-liner featuring the good ol' tr (aka y) operator, punctuated with just & characters (sorry couldn't resist):

perl -ne 'y&a&&&&y&e&&&&y&i&&&&y&o&&&&y&u&&&&print' words.txt
[download]

[reply]
[d/l]
[select]

Re: regex to find vowels in anyorder
by AnomalousMonk (Archbishop) on Dec 19, 2011 at 19:31 UTC

... all 5 vowels in it a,e,i,o,u.

Just as a cautionary side-note, the character set [a,e,i,o,u] (which is what I assume was originally posted without awareness of the effect of square brackets) includes ',' (comma) as a vowel! Please see Markup in the Monastery and Writeup Formatting Tips.

by JavaFan (Canon) on Dec 19, 2011 at 23:35 UTC

I don't think the OP was posting a character set -- he's just listing what he considers vowels.

Looking ahead and looking behind

Re: regex to find vowels in anyorder
by Not_a_Number (Prior) on Dec 19, 2011 at 20:20 UTC

I'm trying to find a regex that will...

You don't need no steenkin' regex:

open my $fh, '<', 'whatever'; # or whatever

my %hash = ( a => 0, e => 1, i => 2, o => 3, u => 4 );

while ( <$fh> ) {
  chomp;
  
  my $copy = $_;
  my @array;

  while ( my $char = lc chop $copy ) {
    no warnings 'uninitialized';
    $array[ $hash{ $char } ] = 1;
  }

  say if ( grep $_, @array ) == 5;
}
[download]

Pretty fast, too, you'll find...

no warnings 'uninitialized';

use warnings

Comments welcome.

Update 2: Oops, just realised that my code doesn't work! Change the contents of the inner while loop to:

$array[ $hash{ $char } ] = 1 if defined $hash{ $char };

Which is what I had originally, before playing with no warnings 'uninitialized';. And then I didn't test properly.

Mea culpa.

[reply]
[d/l]
[select]

Re: regex to find vowels in anyorder
by kennethk (Abbot) on Dec 19, 2011 at 15:28 UTC

Re: regex to find vowels in anyorder
by pvaldes (Chaplain) on Dec 19, 2011 at 15:46 UTC

Nested notation and uppercase

if (/a/i){
    if(/e/i){
      if(/i/i){
        if(/o/i){
          if(/u/i){  
               print "we found the five vowels";
                  }
                }
              }
            }
          }
[download]

by Marshall (Canon) on Dec 19, 2011 at 16:06 UTC

I think BrowserUk's solution is going to be faster. But past that I would not use nested if's when you mean "and" or "&&". Get rid of 4 unnecessary levels of indentation.