Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Counting words in headlines

by MarkM (Curate)
on Feb 04, 2003 at 09:21 UTC ( [id://232505]=note: print w/replies, xml ) Need Help??


in reply to Counting words in headlines

For an initial tempt, I would run with the following:

  1. Read the file in "paragraph" mode. Detect headlines by locating "paragraphs" that have only a single line of text.
  2. Store a word count for header lines into a hash. Note: Force lowercase as a canonical representation.
  3. Lookup each country in the hash to find the count. Note: Force lowercase. See above.

Example:

# Maintain a word count for words found in header lines. my %header_words; # Read text in paragraph mode. $/ = ''; # Read one paragraph at a time. while (<NEWS>) { # Only consider paragraphs that contains a single line of text. if (/\A\s*\S[^\r\n]*\s*\z) { $header_words{lc $_}++ for /(\w+)/g; } } # For each country, obtain the word count. for my $country (@countries) { my $count = $header_words{lc $country} || 0; print "$count $country\n"; }

Replies are listed 'Best First'.
Re: Re: Counting words in headlines
by mooseboy (Pilgrim) on Feb 04, 2003 at 11:02 UTC

    Thanks, seems to work nicely!

    PS: trailing slash missing from (/\A\s*[^\r\n]+\s*\z)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://232505]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (2)
As of 2024-04-24 23:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found