hash comparison

sarvan has asked for the wisdom of the Perl Monks concerning the following question:

Hi there, I have the following script.

use strict;
use Data::Dumper;

my $candidate = 'the the the the the the the';
my @candidate_words = split (/\W/, $candidate);
my $candidate_count=@candidate_words;
my %candidate = ();
map { $candidate{$_}++ } @candidate_words;

my $reference = 'the cat is the on the mat';
my @reference_words = split (/\W/, $reference);
my %reference = ();
map { $reference{$_}++ } @reference_words;

while((my $key, my $val)=each(%candidate)){
print $key."->".$val."\n";
}

print "-------------------------------------\n";

while((my $key, my $val)=each(%reference)){
print $key."->".$val."\n";
}
[download]

This scripts i m writing to find similarity between two sentences. The two sentences are stored in $candidate and $reference variables.

In the current script i made the program to count the occurence of each type of word and stored them in a hash.. Now,each hash has the words and its count from both candidate and reference sentences.

The help in need is, i want to take each words in the candidate and compare that with the two hashes to find the maximum reference count. for eg. if i have the word called "the" in candidate,i want to find the count of this word in %candidate hash as well as %reference hash and i want to take the minimum values(i.e if 2 and 5 is the count of "the" in two hashes i want 2.) out of this two counts. likewise for all words in the candidate.. Plz help me in this.. thanks

Comment on hash comparison Download Code

Replies are listed 'Best First'.
Re: hash comparison by GrandFather (Saint) on Jul 25, 2011 at 08:33 UTC
First off, a couple of style issues: If you find yourself writing the same code over and again put it in a sub Don't use map in place of for used as a statement modifier With that somewhat in mind consider the following: `use strict; use warnings; use List::Util; my %candidate = CountWords ('the the the the the the the'); my %reference = CountWords ('the cat is the on the mat'); my %counts = map {$_ => List::Util::min ($candidate{$_} \|\| 0, $reference{$_} \|\| + 0)} keys %candidate, keys %reference; print "$_: $counts{$_}\n" for sort keys %counts; sub CountWords { my ($sentence) = @_; my @words = split (/\W/, $sentence); my %wordCount; ++$wordCount{$_} for @words; return %wordCount; }` [download] Prints: `cat: 0 is: 0 mat: 0 on: 0 the: 3` [download] True laziness is hard work	[reply] [d/l] [select]
Re: hash comparison by choroba (Cardinal) on Jul 25, 2011 at 07:12 UTC
If you want minimum, just replace the last while loop with this: `for my $key (keys %candidate) { print "$key -> "; my $cand = $candidate{$key}; my $ref = $reference{$key}; if ($ref and $ref < $cand) { print $ref; } else { print $cand; } print "\n"; }` [download] For maximum, invert the < sign.	[reply] [d/l]
Re^2: hash comparison by sarvan (Sexton) on Jul 25, 2011 at 07:47 UTC
Hi, Thanks for the reply And here is the doubt. suppose a word in the candidate that is not at all present in the reference, in such case it gives me the count as 1.(since 1 time it appeared in $cand). But, i expect it to be zero. because it dint appear in $ref.. An Example sentences: $candidate="it is not probable that it is the end"; $reference="it is unlikely that it is the end";	[reply]
Re^3: hash comparison by choroba (Cardinal) on Jul 25, 2011 at 13:29 UTC
It is trivial to change the code I provided to give the expected result (in fact, you did not specify what to do if the term does not occur in the $reference, so I assumed you wanted to see the value from $candidate). Just remove 9 characters and add 5 somewhere :) (YMMW)	[reply]
Re: hash comparison by FunkyMonk (Chancellor) on Jul 25, 2011 at 12:02 UTC
You would be better off splitting on `/\W+/`, rather than just `/\W/` Consider: `say scalar split /\W/, "the cat is the on the mat"; # 7 say scalar split /\W/, "the cat is the on the mat"; # 13 say scalar split /\W+/, "the cat is the on the mat"; # 7 say scalar split /\W+/, "the cat is the on the mat"; # 7` [download]	[reply] [d/l] [select]
Re: hash comparison by Marshall (Canon) on Jul 25, 2011 at 08:15 UTC
Perhaps this helps...Adjust the printout as you want... I prefer "foreach" over "map" when there is no left hand value for the map. #!/usr/bin/perl -w use strict; my $candidate = "the the the the the the the"; my $reference = "the cat is the on the mat"; my %cand_histogram; my %ref_histogram; $cand_histogram{$_}++ foreach (split(/\s+/,$candidate)); $ref_histogram{$_}++ foreach (split(/\s+/,$reference)); my %seen; printf "%-6s %-10s %-10s\n", 'Key','Candidate','Reference'; foreach my $key ( sort { $seen{$b} <=> $seen{$a} #descending word cnt or $a cmp $b #alphabetic otherwise } grep {!$seen{$_}++} # each key just once, # but count 'em also!, # 2 means => in both hashes (keys %cand_histogram, keys %ref_histogram) ) { printf "%-6s %-10s %-10s\n", $key, $cand_histogram{$key}\|\|='0', $ref_histogram{$key} \|\|='0'; } __END__ OUTPUT: Key Candidate Reference the 7 3 cat 0 1 is 0 1 mat 0 1 on 0 1 [download]	[reply] [d/l]
Re^2: hash comparison by sarvan (Sexton) on Jul 25, 2011 at 10:41 UTC
Hi marshall, Thanks for the code.. And one little modification i want to do on that.. Now it gives me all the words in both the candidate and reference and their counts. But the output i look for is, i want to know only minimum count of candidate words among both candidate and reference.. for e.g if a word "the" appears 7 times in candidate and 2 times in reference. it should be able to get 2 as the min between two counts. like this for all the words in candidate alone.. please give me an idea how to do this. I will try Thanks...	[reply]
Re^3: hash comparison by Marshall (Canon) on Jul 25, 2011 at 12:17 UTC
Hi sarvan, I think that if you study the code, you will find that you have all that you need. The last "foreach" loop is on the fancy side of things, but it just loops over all of the unique keys in a special sort order. `$cand_histogram{$key}\|\|='0'` uses 0 as the value in the case that there is no value for `$cand_histogram{$key}`. The print statement prints the 3 things that you need in order to calculate what you want. Why don't you give some code a try? Post your effort back here after you study it a bit.	[reply] [d/l] [select]


Perl-Sensitive Sunglasses
	PerlMonks