Re: Counting words in headlines


"be consistent"
	PerlMonks

Re: Counting words in headlines

by MarkM (Curate)

on Feb 04, 2003 at 09:21 UTC ( [id://232505]=note: print w/replies, xml )

Need Help??

in reply to Counting words in headlines

For an initial tempt, I would run with the following:

Read the file in "paragraph" mode. Detect headlines by locating "paragraphs" that have only a single line of text.
Store a word count for header lines into a hash. Note: Force lowercase as a canonical representation.
Lookup each country in the hash to find the count. Note: Force lowercase. See above.

Example:

# Maintain a word count for words found in header lines.
my %header_words;

# Read text in paragraph mode.
$/ = '';

# Read one paragraph at a time.
while (<NEWS>) {

    # Only consider paragraphs that contains a single line of text.
    if (/\A\s*\S[^\r\n]*\s*\z) {
        $header_words{lc $_}++ for /(\w+)/g;
    }
}

# For each country, obtain the word count.
for my $country (@countries) {
    my $count = $header_words{lc $country} || 0;
    print "$count $country\n";
}
[download]

Comment on Re: Counting words in headlines Download Code

Replies are listed 'Best First'.
Re: Re: Counting words in headlines by mooseboy (Pilgrim) on Feb 04, 2003 at 11:02 UTC
Thanks, seems to work nicely! PS: trailing slash missing from `(/\A\s[^\r\n]+\s\z)`	[reply] [d/l]

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://232505]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others examining the Monastery: (2)

As of 2024-04-24 23:44 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found