Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^6: putting text into array word by word

by jms53 (Monk)
on Jan 09, 2012 at 23:19 UTC ( [id://947091]=note: print w/replies, xml ) Need Help??


in reply to Re^5: putting text into array word by word
in thread putting text into array word by word

during the foreach loop I remove punctuation and set to lowercase (such that No and no are the same word.
while (<FILE>) { my @these_words = split(' ', $_); foreach my $this_word (@these_words) { $this_word =~ s/[[:punct:]]//g; $this_word = lc($this_word); push @all_words, $this_word; } }

Replies are listed 'Best First'.
Re^7: putting text into array word by word
by Not_a_Number (Prior) on Jan 10, 2012 at 10:35 UTC
    $this_word =~ s/[[:punct:]]//g;

    The only problem with that approach is that it removes internal punctuation (ie apostrophes) as well, so that I'll becomes ill, she'd becomes shed, etc. ('Why was Virgina Woolf so obsessed with sheds?' I hear someone ask.)

    I'd use this instead:

    $this_word =~ s/^[[:punct:]]+//; # Remove leading punct. $this_word =~ s/[[:punct:]]+$//; # Remove trailing punct.

    Update: added Virginia Woolf sentence.

      hadn't thought about that. Good catch, thanks!

        OK, let's extract another worm from the can we've opened. Say we want to count all the times 'John' appears in a text. Given a sentence:

        Was it John or John's brother?

        Should the count be one or two?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://947091]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2024-04-25 12:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found