Re^4: Word Count and Match

@Dave: There actually is a problem with the code. I've modified the code below and the output. As you can see it lists the names twice instead of just once. That's why I wanted to match the data exactly using the specific field in the file.

#!/usr/bin/env perl

use strict;
use warnings;

my %count;

my $namecnt='David|Tom|Sam|Will|Dave|William|Thomas';

while(<DATA>){

    my @words = split(":");

    foreach my $word (@words){
        if($word=~/($namecnt)/io){
            $count{$1}++;
        }
    }
}

foreach my $word (sort keys %count) {
    printf("(STDOUT) %39s  %-14s  %-19s  %6s", "There are", $count{$wo
+rd}, $word, "Name(s)\n");
    print "(OUTPUT) There are $count{$word} $word Name(s)\n";
}

__DATA__
1:NAME:Bob:Bobville:Phone
2:NAME:Dave:Davis:Phone
3:NAME:Will:Willard:Phone
4:NAME:Todd:Toadlane:Phone
[download]

(STDOUT) There are 1 Dave Name(s)
(OUTPUT) There are 1 Dave Name(s)
(STDOUT) There are 2 Will Name(s)
(OUTPUT) There are 2 Will Name(s)

Comment on Re^4: Word Count and Match Download Code

Replies are listed 'Best First'.
Re^5: Word Count and Match by choroba (Cardinal) on Jan 08, 2021 at 13:44 UTC
When matching, use anchors in the regex to only match the whole word: `my $namecnt = qr/^(David\|Tom\|Sam\|Will\|Dave\|William\|Thomas)$/;` [download] `^` matches at the beginning, `$` matches at the end. The match will now look like (I removed the `/o` as it's not recommended) `if ($word =~ /$namecnt/i) {` [download] Note that the first argument to split is a regex. It's clearer to write `my @words = split /:/;` [download] `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^5: Word Count and Match by LanX (Saint) on Jan 08, 2021 at 13:41 UTC
> As you can see it lists the names twice* instead of just once.* Do you understand the effects of `print` and `printf`? Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^5: Word Count and Match by davido (Cardinal) on Jan 08, 2021 at 19:58 UTC
No, there is no problem with the code's output. Your original code was printing using printf to STDOUT, and it was printing via print to the OUTPUT file handle. Two distinct destinations; one the terminal, the other a file. For the purposes of this demonstration I switched it so that both print to STDOUT, because printing to a file adds complexity to the example that you don't need. But in the print output I identify which output stream your original code would have printed to. So it is intentional that we are getting output twice. As you can see it is not identical output. And as you can see from the code, it is because we have a printf statement, and a print statement, just as your original code did. So in other words, you still haven't shown us the code that is misbehaving, unless the misbehavior was that you were printing to STDOUT and to an output file because you have two print statements. Do you need for us to suggest removing one of the two print statements? Dave	[reply]
Re^5: Word Count and Match by eyepopslikeamosquito (Archbishop) on Jan 08, 2021 at 15:51 UTC
I see you're still using the `/o` modifier in: `if($word=~/($namecnt)/io){` [download] Did you bother to read and understand my earlier reply?	[reply] [d/l] [select]
Re^5: Word Count and Match by Marshall (Canon) on Jan 09, 2021 at 20:16 UTC
The first problem that I had with your code was the confusing name, $namecnt. I expected a numeric scalar for that type of name! I changed that var name to "$names2count" to imply multiple names which will be counted - I assume in a case insensitive manner. Added: William test case use strict; use warnings; my %name_count; my $names2cnt='David\|Tom\|Sam\|Will\|Dave\|William\|Thomas'; while (my $line = <DATA>) { next unless ($line =~ /\S/); #skip blank lines my $name = (split (":",$line))[2]; $name_count{$1}++ if $names2cnt =~ /\b($name)\b/i; } foreach my $name (sort keys %name_count) { print "$name => $name_count{$name}\n"; } =prints: Dave => 1 Will => 3 #allows Will and WILL and WiLL spellings William => 1 =cut __DATA__ 1:NAME:Bob:Bobville:Phone 2:NAME:Dave:Davis:Phone 3:NAME:Will:Willard:Phone 4:NAME:Todd:Toadlane:Phone 5:NAME:WILL:Street:Phone 6:NAME:WiLL:Street2:phone2 7:NAME:WilliaM:xyz:1234 [download] Update: I'm not sure that this \b stuff in the regex is necessary. I put some obvious test cases into the code, but not all possible test cases.	[reply] [d/l]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks