Perl: the Markov chain saw | |
PerlMonks |
Re: Top five words by occurrenceby socketdave (Curate) |
on Jul 18, 2005 at 16:54 UTC ( [id://475813]=note: print w/replies, xml ) | Need Help?? |
split is just dicing up your input by whitespace. A '$wd =~ s/\W//g;' before your '$count{$wd}++;' will wipe out anything other than letters and numbers (probably a bad idea if you need to deal with email addresses or URLs). You also may want to '$count{lc($wd)}++;' to ignore capitalization. Update: and as far as just getting the 5 most common words, you can just run the output of your script through: |sort -n|tail -n 5
In Section
Seekers of Perl Wisdom
|
|