comment on

Assuming you start with this:

$_ = <<SENTENCE;
There once was a doggy in the window. I went into the store 
and inquired about the price, the doggy was too nice to ever 
think twice, so now I have a doggy to care for.
SENTENCE
[download]

The next statement will extract all the vowels and save each one as a separate element in the @vowels array ("y" is not counted as a vowel in this case):

my @vowels = ( /[aeiuo]/gi );
[download]

I get 53 vowels, total, counting both upper- and lowe-case (because of the "i" modifier after the regex). If you wanted to count "y" when it functions as a vowel (i.e. when it is not followed by another vowel), the regex would be like this:

my @vowels = ( /[aeiou]|y(?![aeiou])/gi );
[download]

I get 56 vowels that way. The "(?!...)" part is called a "zero-width negative-look-ahead assertion" (sounds scary, eh?), and you can look it up in the perlre man page.

By putting the "g" modifier on the regex ("match globally" -- i.e. find all occurrences of the pattern), and putting the whole thing in a list context (assigning to an array), all the matches are captured as array elements.

update: Some "grammar police" might insist that "y" can only function as a vowel when there is no other vowel either before or after it -- e.g. "y" is not a vowel in "clay" (just as "w" is not a vowel in "claw"). Still, it seems like it has to be a vowel in cases like "rhythm". Those who hold this position would make the regex like this:

my @vowels = ( /[aeiou]|(?<![aeiou])y(?![aeiou])/gi );
[download]

using "(?<!...)" -- the "zero-width negative-look-behind assertion". The "zero-width" feature means that they are not counted as part of the matched string -- the regex still only matches (and captures) a single character at a time, which is either one of "a e i o u" or else a "y" that satifies the zero-width assertions.

In reply to Re: Get Vowels from sentence by graff
in thread Get Vowels from sentence by tiny_tim

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Don't ask to ask, just ask
	PerlMonks