http://qs321.pair.com?node_id=239809

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

HI, I have some problems in doing the a perl program about similarity. See if anyone helps. thanks

Similiarity contains a formula to calculate liks this:

Similiarity = 2 x ( intersection/ total)

I tried to solve the problem, however i'm stuck in the middle. Since when i write the program, i need to run a stoplist in the program and fliter some words out from the stoplist before calculating the rest of the words in the files. The main point is to use one files and compare with the rest of the files.

However, when i was doing it, i do not know how to convert some command from hash to array or vice versa, therefore, i am stuck.

here's my script, i hope if anyone can help me.:

#! /usr/local/bin/perl -w use strict ; my $stopfile = 'stopwords'; my $base= shift @ARGV; my @files = @ARGV; my %stopwords=(); my %basefilterwords=(); my %filterwords=(); my @basewords; my @words; open STOP, "<$stopfile"; while (my $stopword =<STOP>) { chomp $stopword; $stopwords {$stopword} =1; } close STOP; open BASETEXT, "<$base"; while (my $line =<BASETEXT> ) { my @basewords = split /\W/, $line ; foreach my $baseword (@basewords) { if ($baseword ne '') { $baseword = lc $baseword ; } if ($stopwords{$baseword}) { } else { $basefilterwords{$baseword}=1; } } close BASETEXT; foreach my $file ( @ARGV ) { open TEXT, "<$file"; while (my $line =<TEXT> ) { my @words = split /\W/, $line ; foreach my $word (@words) { if ($word ne '') { $word = lc $word ; } if ($stopwords{$word}) { } else { $filterwords{$word}=1; } } close TEXT; } }
I just did until here, starting to fliter the words, then i am stuck in here since i do not know how to change the cammand into array.. here it is:
@D1 = map lc $_, $D1 =~ /(\w+)/g ; my @D2 = map lc $_, $D2 =~ /(\w+)/g ; my %D2 = () ; @D2{@D2} = (1) x scalar @D2 ; my $total = scalar @D1 + scalar @D1 ; my $intersection = 0 ; # count the number of words in common foreach my $word ( @D1 ) { ++$intersection if $D2{$word} ; } my $similarity = 2 * ( $intersection/$total ) ; print "\n$similarity\n\n" ;
I am sure that this part needs to have some changes, however, I really do not understand. I hope there has people can help me to solve it thanks.