Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

How can I print out a word-frequency or line-frequency summary?

by faq_monk (Initiate)
on Oct 08, 1999 at 00:25 UTC ( [id://669]=perlfaq nodetype: print w/replies, xml ) Need Help??

Current Perl documentation can be found at perldoc.perl.org.

Here is our local, out-dated (pre-5.6) version:

To do this, you have to parse out each word in the input stream. We'll pretend that by word you mean chunk of alphabetics, hyphens, or apostrophes, rather than the non-whitespace chunk idea of a word given in the previous question:

    while (<>) {
        while ( /(\b[^\W_\d][\w'-]+\b)/g ) {   # misses "`sheep'"
            $seen{$1}++;
        }
    }
    while ( ($word, $count) = each %seen ) {
        print "$count $word\n";
    }

If you wanted to do the same thing for lines, you wouldn't need a regular expression:

    while (<>) { 
        $seen{$_}++;
    }
    while ( ($line, $count) = each %seen ) {
        print "$count $line";
    }

If you want these output in a sorted order, see the section on Hashes.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (6)
As of 2025-03-27 09:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    When you first encountered Perl, which feature amazed you the most?










    Results (70 votes). Check out past polls.

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.