Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

How can one get the unique words in text files using perl module List::Compare?

by supriyoch_2008 (Monk)
on Mar 05, 2014 at 06:41 UTC ( [id://1077045]=perlquestion: print w/replies, xml ) Need Help??

supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonks,

I am interested in finding the unique words in text files (say 3 files namely 1.txt, 2.txt & 3.txt) each having two words separated by comma. I have written a script m.pl (given below). The script works fine in the initial part but fails to produce the unique words for each file at the end. I have given the three text files also. I am looking forward to PerlMonks for suggestions to sort out this problem. I have observed that three hexadecimal values of final array after using the module are exactly same.

Here goes the script m.pl:

# To find unique words in files: #!/usr/bin/perl use warnings; use List::Compare; # Module # To enter many files: my $entry; my @a; # Use of do-until loop: do {print"\n\n Press 1 to enter a new file or 2 to exit: \n"; # do L +OOP $entry=<STDIN>; chomp ($entry); $file_no=0; if ($entry==1) {$file_no++; # if LOOP starts # Upload the files: print "\n Enter the filename: "; $file=<STDIN>; chomp $file; ############################ # open the file, or exit: ############################ unless ( open(FILE, $file) ) { print "Cannot open file \"$file\"\n\n"; exit;} @string = <FILE>; close FILE; $string=join(" ",@string); $string=~ s/\s//gi; push @all,$string; } # End of if LOOP } until ($entry==2); # End of do-until LOOP ############################################### ################################################ $fnum=0; $wnum=0; foreach $item (@all) {$fnum++; # 1st foreach LOOP for all files @file_words=(); # To empty array while ($item=~ /[A-Za-z].*?,/g) # While LOOP2 starts + { $wnum++; $word=$&; $word=~ s/,//g; $word=~ s/\s//g; push @file_words,$word; } # while LOOP2 ends for all words in file $num_all_words=@file_words; print "\n File No. $fnum Total Words: $num_all_words Words are:\n"; print join ("\n",@file_words); print "\n\n\n"; push @s,\@file_words; } # foreach LOOP ends $ele_no=@s; print "\n\n Elements No.: $ele_no\n Final Array: @s\n\n"; $all=List::Compare->new(@s); # Use of module function @file1=$all->get_unique(0); @file2=$all->get_unique(1); @file3=$all->get_unique(2); print "\n\n Unique Words in Files:\n\n file1: @file1\n file2: @file2\n file3: @file3\n\n"; exit; ####################

Three text files i.e. 1.txt, 2.txt & 3.txt are given below:

bat, cat,
rat, bat,
bat, dog,

Here goes the incorrect result of m.pl:

C:\Users\x\Desktop>m.pl Press 1 to enter a new file or 2 to exit: 1 Enter the filename: 1.txt Press 1 to enter a new file or 2 to exit: 1 Enter the filename: 2.txt Press 1 to enter a new file or 2 to exit: 1 Enter the filename: 3.txt Press 1 to enter a new file or 2 to exit: 2 File No. 1 Total Words: 2 Words are: bat cat File No. 2 Total Words: 2 Words are: rat bat File No. 3 Total Words: 2 Words are: bat dog Elements No.: 3 Final Array: ARRAY(0x2902f5c) ARRAY(0x2902f5c) ARRAY(0x2902f5c) Unique Words in Files: file1: file2: file3: ####################

Correct result of m.pl (last part) should look like:

Elements No.: 3 Final Array: ??? Unique Words in Files: file1: cat file2: rat file3: dog

Replies are listed 'Best First'.
Re: How can one get the unique words in text files using perl module List::Compare?
by choroba (Cardinal) on Mar 05, 2014 at 06:59 UTC
    The problem is you are not looking for unique words in each file, but for unique words across all the files. Do I understand your requirement correctly?
    #!/usr/bin/perl use warnings; use strict; my %seen; for my $file (@ARGV) { open my $FH, '<', $file or die $!; while (<$FH>) { chomp; s/,\s*$//; push @{ $seen{$_} }, $file; } } for my $string (grep 1 == @{ $seen{$_} }, keys %seen) { print "$string unique in $seen{$string}[0]\n"; }

    $ perl 1077045.pl [123].txt cat unique in 1.txt dog unique in 3.txt rat unique in 2.txt

    If the word is considered "unique" even if it occurs several times in one file, you might need to use a HoH instead of a HoA.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Hi choroba,

      Thanks for your suggestions.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1077045]
Approved by shmem
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-04-24 08:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found