How do I count the frequency of words in a file and save them for later?

ghenry has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How do I count the frequency of words in a file and save them for later? by planetscape (Chancellor) on May 26, 2005 at 01:31 UTC
I usually use this script and pipe the results to a text file: #!/usr/local/bin/perl # $Id: wordfreq.perl,v 1.13 2001/05/16 23:46:40 doug Exp $ # http://www.bagley.org/~doug/shootout/ <= old URL; dead now # http://dada.perl.it/shootout/wordfreq.perl.html <= URL as of time th +is post was written # Tony Bowden suggested using tr versus lc and split(/[^a-z]/) use strict; my %count = (); while (read(STDIN, $_, 4095) and $_ .= <STDIN>) { tr/A-Za-z/ /cs; ++$count{$_} foreach split(' ', lc $_); } my @lines = (); my ($w, $c); push(@lines, sprintf("%7d\t%s\n", $c, $w)) while (($w, $c) = each(%cou +nt)); print sort { $b cmp $a } @lines; [download] planetscape	[reply] [d/l]
Re: Answer: How do I count the frequency of words in a file and save them for later? by TedPride (Priest) on May 26, 2005 at 06:13 UTC
There have been - and probably will be - quite a few posts regarding word counts. This solution doesn't work. To give just one example, "can't" ends up as 1 "can" and 1 "t". Other solutions often have it as "cant", but what is really needed is testing to see if the apostrophe has at least one letter on each side. Also, what about end of line word splits? A word like: google- plex Should be converted to googleplex before counting. I imagine there are one or two other things to program in as well. I'm not saying this is necessarily a bad place to start, but you need to program in some modifications. Better get cracking.	[reply]
Re^2: Answer: How do I count the frequency of words in a file and save them for later? by planetscape (Chancellor) on Jun 13, 2005 at 06:54 UTC
Thanks for pointing that out, TedPride. I have only used this for relatively simple things. Normally, for my purposes, I use Ted Pedersen's Ngram Statistics Package. planetscape	[reply]
Re: How do I count the frequency of words in a file and save them for later? by whakka (Hermit) on Feb 04, 2009 at 17:43 UTC
Using a one-liner (and same formatting as previous posts): `$ perl -nle '$w{$_}++ for grep /\w/, map { s/[\. ,]*$//g; lc($_) } spl +it; sub END { printf("%7d\t%s\n", $c, $w) while (($w,$c) = each(%w)) +}' files...` [download]	[reply] [d/l]
Re: How do I count the frequency of words in a file and save them for later? by rcaputo (Chaplain) on Feb 05, 2009 at 03:12 UTC
The standard UNIX tool chain works fine: `perl -nle "print for /(\w['\w-]*)/g" input.text \| sort \| uniq -c \| sor +t -rn \| tee word-list.text` [download]	[reply] [d/l]
Re: How do I count the frequency of words in a file and save them for later? by bimleshsharma (Beadle) on Jul 01, 2011 at 08:19 UTC
****** Counting exact word from file ********* open (f2,"abc.txt"); while (<f2>) { push(@w,split(/\W+/, $_));} close f2; foreach (@w) { if ($_ eq "search") {$c++;} } print "\n Total count of "searched" word is $c\n"; Originally posted as a Categorized Answer.	[reply]
Re: How do I count the frequency of words in a file and save them for later? by amitbhosale (Acolyte) on Feb 13, 2008 at 09:40 UTC
In my example, output is in the form of a perl hash(ref) structure. This lets you load it easily in another program using `do`. `my %seen=(); while(<>) { chomp; foreach my $word ( grep /\w/, split ) { $word =~ s/[. ,]*$//; # strip off punctuation, etc. $seen{$word}++; } } use Data::Dumper; $Data::Dumper::Terse = 1; print Dumper \%seen;` [download] For example, given an input file containing: `Click on a letter above to see phrasal verbs beginning with that lette +r. You will get a list of phrasal verbs and their definitions. If you the +n click on an individual verb, you will get more information on it, including example sentences, whether it is British or American English +, and whether it is separable or not.` [download] Output looks like: { 'you' => 2, 'a' => 2, 'not' => 1, 'that' => 1, 'sentences' => 1, 'individual' => 1, 'see' => 1, 'on' => 3, 'American' => 1, 'or' => 2, 'verb' => 1, 'Click' => 1, 'list' => 1, 'English' => 1, 'letter' => 2, 'their' => 1, 'whether' => 2, 'with' => 1, 'and' => 2, 'verbs' => 2, 'of' => 1, 'is' => 2, 'definitions' => 1, 'to' => 1, 'above' => 1, 'will' => 2, 'If' => 1, 'get' => 2, 'including' => 1, 'beginning' => 1, 'it' => 3, 'example' => 1, 'information' => 1, 'separable' => 1, 'British' => 1, 'click' => 1, 'phrasal' => 2, 'then' => 1, 'You' => 1, 'more' => 1, 'an' => 1 } [download]	[reply] [d/l] [select]
Re: How do I count the frequency of words in a file and save them for later? by amitbhosale (Acolyte) on Feb 13, 2008 at 08:09 UTC
this script counts the occurrence of each word present in the file and print a summary. #!/usr/bin/perl -w use strict; use IO::File; my %seen=(); my $file_name="/home/myprog/matter"; my $file=IO::File->new("< $file_name") or die "Couldn't open $file_nam +e for reading:$! \n"; my $line; while(defined($line=$file->getline())) { foreach my $word (split / /,$line) { chomp($word); if ($word =~ /\w+/) { $word =~ s/[. ,]$//; if ($seen{$word}) { my $count; $count=$seen{$word}; $count=$count+1; $seen{$word}=$count; } else { $seen{$word}=1; } }# if ($word =~ /\w+/) end here } # foreach my $word (split / /,$line) end here } # while(defined($line=$file->getline())) print "\n =============== o/p Word and it's count frequency=========== +============="; foreach my $val (keys %seen) { print "\n $val => $seen{$val}"; } $file->close(); [download] Let me know if any changes are required. Originally posted as a Categorized Answer.	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.


P is for Practical
	PerlMonks