For an initial tempt, I would run with the following:
- Read the file in "paragraph" mode. Detect headlines by locating "paragraphs" that have only a single line of text.
- Store a word count for header lines into a hash. Note: Force lowercase as a canonical representation.
- Lookup each country in the hash to find the count. Note: Force lowercase. See above.
Example:
# Maintain a word count for words found in header lines.
my %header_words;
# Read text in paragraph mode.
$/ = '';
# Read one paragraph at a time.
while (<NEWS>) {
# Only consider paragraphs that contains a single line of text.
if (/\A\s*\S[^\r\n]*\s*\z) {
$header_words{lc $_}++ for /(\w+)/g;
}
}
# For each country, obtain the word count.
for my $country (@countries) {
my $count = $header_words{lc $country} || 0;
print "$count $country\n";
}