Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

How do I count the frequency of words in a file and save them for later?

by ghenry (Vicar)
on May 17, 2005 at 13:11 UTC ( [id://457784]=perlquestion: print w/replies, xml ) Need Help??

ghenry has asked for the wisdom of the Perl Monks concerning the following question:

You want to search a file for unique words, count them and print a summary, but also saving these results for later use.

Using a simple database (a dbm file), you could do (adapted from Perl Cookbook, 2ed):

#!/usr/bin/perl use strict; use warnings; # Basename of Simple Database file my $DBFILE = 'wordcount_db'; # open database, accessed through %WORDS dbmopen (my %WORDS, $DBFILE, 0666) or die "Can't open $DBFILE: $!\n"; # Make a word frequency counter while (<>) { while ( /(\w['\w-]*)/g ) { $WORDS{lc $1}++; } } # Output hash in a descending numeric sort of its values foreach my $word ( sort { $WORDS{$b} <=> $WORDS{$a} } keys %WORDS) { printf "%5d %s\n", $WORDS{$word}, $word; } # Close the database dbmclose %WORDS;

TIMTOWTDI however.

Originally posted as a Categorized Question.

  • Comment on How do I count the frequency of words in a file and save them for later?
  • Download Code

Replies are listed 'Best First'.
Re: How do I count the frequency of words in a file and save them for later?
by planetscape (Chancellor) on May 26, 2005 at 01:31 UTC

    I usually use this script and pipe the results to a text file:

    #!/usr/local/bin/perl # $Id: wordfreq.perl,v 1.13 2001/05/16 23:46:40 doug Exp $ # http://www.bagley.org/~doug/shootout/ <= old URL; dead now # http://dada.perl.it/shootout/wordfreq.perl.html <= URL as of time th +is post was written # Tony Bowden suggested using tr versus lc and split(/[^a-z]/) use strict; my %count = (); while (read(STDIN, $_, 4095) and $_ .= <STDIN>) { tr/A-Za-z/ /cs; ++$count{$_} foreach split(' ', lc $_); } my @lines = (); my ($w, $c); push(@lines, sprintf("%7d\t%s\n", $c, $w)) while (($w, $c) = each(%cou +nt)); print sort { $b cmp $a } @lines;

    planetscape

      There have been - and probably will be - quite a few posts regarding word counts. This solution doesn't work. To give just one example, "can't" ends up as 1 "can" and 1 "t". Other solutions often have it as "cant", but what is really needed is testing to see if the apostrophe has at least one letter on each side. Also, what about end of line word splits? A word like:

      google-
      plex

      Should be converted to googleplex before counting. I imagine there are one or two other things to program in as well.

      I'm not saying this is necessarily a bad place to start, but you need to program in some modifications. Better get cracking.

Re: How do I count the frequency of words in a file and save them for later?
by whakka (Hermit) on Feb 04, 2009 at 17:43 UTC
    Using a one-liner (and same formatting as previous posts):
    $ perl -nle '$w{$_}++ for grep /\w/, map { s/[\. ,]*$//g; lc($_) } spl +it; sub END { printf("%7d\t%s\n", $c, $w) while (($w,$c) = each(%w)) +}' files...
Re: How do I count the frequency of words in a file and save them for later?
by rcaputo (Chaplain) on Feb 05, 2009 at 03:12 UTC

    The standard UNIX tool chain works fine:

    perl -nle "print for /(\w['\w-]*)/g" input.text | sort | uniq -c | sor +t -rn | tee word-list.text
Re: How do I count the frequency of words in a file and save them for later?
by bimleshsharma (Beadle) on Jul 01, 2011 at 08:19 UTC

    ******** Counting exact word from file ***********

    open (f2,"abc.txt");
    while (<f2>)
    { push(@w,split(/\W+/, $_));}
    close f2;
    foreach (@w)
    { if ($_ eq "search")
    {$c++;}
    }
    print "\n Total count of "searched" word is $c\n";

    Originally posted as a Categorized Answer.

Re: How do I count the frequency of words in a file and save them for later?
by amitbhosale (Acolyte) on Feb 13, 2008 at 09:40 UTC
    In my example, output is in the form of a perl hash(ref) structure. This lets you load it easily in another program using do.
    my %seen=(); while(<>) { chomp; foreach my $word ( grep /\w/, split ) { $word =~ s/[. ,]*$//; # strip off punctuation, etc. $seen{$word}++; } } use Data::Dumper; $Data::Dumper::Terse = 1; print Dumper \%seen;
    For example, given an input file containing:
    Click on a letter above to see phrasal verbs beginning with that lette +r. You will get a list of phrasal verbs and their definitions. If you the +n click on an individual verb, you will get more information on it, including example sentences, whether it is British or American English +, and whether it is separable or not.
    Output looks like:
    { 'you' => 2, 'a' => 2, 'not' => 1, 'that' => 1, 'sentences' => 1, 'individual' => 1, 'see' => 1, 'on' => 3, 'American' => 1, 'or' => 2, 'verb' => 1, 'Click' => 1, 'list' => 1, 'English' => 1, 'letter' => 2, 'their' => 1, 'whether' => 2, 'with' => 1, 'and' => 2, 'verbs' => 2, 'of' => 1, 'is' => 2, 'definitions' => 1, 'to' => 1, 'above' => 1, 'will' => 2, 'If' => 1, 'get' => 2, 'including' => 1, 'beginning' => 1, 'it' => 3, 'example' => 1, 'information' => 1, 'separable' => 1, 'British' => 1, 'click' => 1, 'phrasal' => 2, 'then' => 1, 'You' => 1, 'more' => 1, 'an' => 1 }
Re: How do I count the frequency of words in a file and save them for later?
by amitbhosale (Acolyte) on Feb 13, 2008 at 08:09 UTC
    this script counts the occurrence of each word present in the file and print a summary.
    #!/usr/bin/perl -w use strict; use IO::File; my %seen=(); my $file_name="/home/myprog/matter"; my $file=IO::File->new("< $file_name") or die "Couldn't open $file_nam +e for reading:$! \n"; my $line; while(defined($line=$file->getline())) { foreach my $word (split / /,$line) { chomp($word); if ($word =~ /\w+/) { $word =~ s/[. ,]$//; if ($seen{$word}) { my $count; $count=$seen{$word}; $count=$count+1; $seen{$word}=$count; } else { $seen{$word}=1; } }# if ($word =~ /\w+/) end here } # foreach my $word (split / /,$line) end here } # while(defined($line=$file->getline())) print "\n =============== o/p Word and it's count frequency=========== +============="; foreach my $val (keys %seen) { print "\n $val => $seen{$val}"; } $file->close();
    Let me know if any changes are required.

    Originally posted as a Categorized Answer.

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://457784]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-19 19:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found