Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Get Vowels from sentence

by tiny_tim (Sexton)
on Jan 04, 2007 at 04:58 UTC ( [id://592867]=perlquestion: print w/replies, xml ) Need Help??

tiny_tim has asked for the wisdom of the Perl Monks concerning the following question:

I would like to be able to pull all the vowels out of the following sentence:

There once was a doggy in the window. I went into the store and inquired about the price, the doggy was too nice to ever think twice, so now I have a doggy to care for.

I have been trying a few regular expressions, but am fairly new to Perl, and am not sure how to go about this...

Replies are listed 'Best First'.
Re: Get Vowels from sentence
by graff (Chancellor) on Jan 04, 2007 at 06:21 UTC
    Assuming you start with this:
    $_ = <<SENTENCE; There once was a doggy in the window. I went into the store and inquired about the price, the doggy was too nice to ever think twice, so now I have a doggy to care for. SENTENCE
    The next statement will extract all the vowels and save each one as a separate element in the @vowels array ("y" is not counted as a vowel in this case):
    my @vowels = ( /[aeiuo]/gi );
    I get 53 vowels, total, counting both upper- and lowe-case (because of the "i" modifier after the regex). If you wanted to count "y" when it functions as a vowel (i.e. when it is not followed by another vowel), the regex would be like this:
    my @vowels = ( /[aeiou]|y(?![aeiou])/gi );
    I get 56 vowels that way. The "(?!...)" part is called a "zero-width negative-look-ahead assertion" (sounds scary, eh?), and you can look it up in the perlre man page.

    By putting the "g" modifier on the regex ("match globally" -- i.e. find all occurrences of the pattern), and putting the whole thing in a list context (assigning to an array), all the matches are captured as array elements.

    update: Some "grammar police" might insist that "y" can only function as a vowel when there is no other vowel either before or after it -- e.g. "y" is not a vowel in "clay" (just as "w" is not a vowel in "claw"). Still, it seems like it has to be a vowel in cases like "rhythm". Those who hold this position would make the regex like this:

    my @vowels = ( /[aeiou]|(?<![aeiou])y(?![aeiou])/gi );
    using "(?<!...)" -- the "zero-width negative-look-behind assertion". The "zero-width" feature means that they are not counted as part of the matched string -- the regex still only matches (and captures) a single character at a time, which is either one of "a e i o u" or else a "y" that satifies the zero-width assertions.
      Thank you. This is what i wanted to do.
      Hi, digging up an old thread. Here's my version of when to use "an" or "a" before a word. Treats ura*, uga*, uni* as non-vowels. Honest and Hour have silent consonants. Have I missed out anything? Or is there a better way to do this?
      if ( $something =~ /^[aeiou]/i && $something !~ /(un|uga|ura)/i || $so +mething =~ /hour/i || something =~ /honest/i ) { # First letter of $something is a vowel $print = "an $something"; } else { $print = "a $something"; }
        The thread wasn't really about handling "a" vs. "an" correctly, but that's okay. As for your code snippet... apart from forgetting the "$" on the fourth mention of "something", there are some problems...

        Since your three "exceptional case" conditions are not anchored to be string-initial, they'll misfire on words like "aunt", "aura", "man-hour", "dishonest", etc. Also, after you fix the anchoring, don't forget that adjectives can occur between the indefinite article and the noun, and many adjectives can begin with "un" as a negative prefix -- e.g. if your code snippet is preceded by:

        $something = "untimely death";
        you will get Bad English ("a" instead of "an"). In fact, it's pretty hard to handle word-initial "u" -- consider "unanswered" vs. "unanimous", "uninformed" vs. "uniform", etc. (And folks may disagree about cases like "Ugandan", "Ugaritic" and even "Uruguayan".) And then there's all those words starting with "eur"... Unless you take the time to plug in a pronouncing dictionary, you'll just have to put up with some mistakes.

        Still, here's a stab that tries to handle most cases in a reasonable way -- note how it sets the initial "u" problem apart as a separate condition, allowing it to be more complicated all by itself. (This code can be run with target words as command-line args, but the subroutine is self-contained and easy to modularize.)

        #!/usr/bin/perl use strict; sub indef_article { local $_ = shift; my $article; if ( /^(e)?u(\w+)/i ) { my $e = $1; local $_ = $2; $article = ( $e or /^(?:nanim|ni(?!n)|[gr][aeu])/i ) ? "a" : " +an"; # assumes y-glide pronunciations for Uruguay, Uganda, etc. } else { $article = ( /^(?:[aeio]|ho(?:ur|nest))/i ) ? "an" : "a"; } return "$article $_"; } print indef_article( $_ ).$/ for ( @ARGV );
Re: Get Vowels from sentence
by ysth (Canon) on Jan 04, 2007 at 07:58 UTC
    If what you want is to end up with a string containing just the vowels, I would use the transliterate operator:
    $sentence =~ y/aeiou//cd;
      Thanking you also.
Re: Get Vowels from sentence
by ferreira (Chaplain) on Jan 04, 2007 at 10:04 UTC

    If what you want is to end up with a histogram-like structure, stuff the found vowels into a hash. You may try something like this:

    my $phrase = 'There once was a doggy in the window. I went into the st +ore and inquired about the price, the doggy was too nice to ever thin +k twice, so now I have a doggy to care for.'; my %vowels; while ($phrase =~ /([aeiou])/gi) { $vowels{lc $1}++; } use YAML; print Dump \%vowels; __END__ the output would look like --- a: 8 e: 17 i: 11 o: 15 u: 2
      use YAML means can u explain

        See use and YAML.

        The use statement loads the module YAML.

Re: Get Vowels from sentence
by SheridanCat (Pilgrim) on Jan 04, 2007 at 05:07 UTC
    When you say "pull all vowels" and "get vowels" what are you hoping to end up with? Do you want all the vowels returned to you in some structure or do you want the original string back without vowels?
      If it's the latter you're after, try this:
      my $string = "There once was a doggy in the window. I went into the st +ore and inquired about the price, the doggy was too nice to ever thin +k twice, so now I have a doggy to care for."; $string =~ s/[aeiou]//gi;
Re: Get Vowels from sentence
by PreferredUserName (Pilgrim) on Jan 04, 2007 at 18:12 UTC
    echo "There once was a doggy in the window. I went into the store and inquired about the price, the doggy was too nice to ever think twice, so now I have a doggy to care for." | perl -pe 's/[aeiou]//gi; s/y//gi if rand() > .5'

    :)

Re: Get Vowels from sentence
by tiny_tim (Sexton) on Jan 04, 2007 at 22:40 UTC
    Wow, thanks for all the replies. I am still learning, and your responses have been great. Thanks !

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://592867]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-04-19 10:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found