Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re^4: Word Count and Match

by PilotinControl (Pilgrim)
on Jan 08, 2021 at 13:30 UTC ( #11126600=note: print w/replies, xml ) Need Help??

in reply to Re^3: Word Count and Match
in thread Word Count and Match

@Dave: There actually is a problem with the code. I've modified the code below and the output. As you can see it lists the names twice instead of just once. That's why I wanted to match the data exactly using the specific field in the file.

#!/usr/bin/env perl use strict; use warnings; my %count; my $namecnt='David|Tom|Sam|Will|Dave|William|Thomas'; while(<DATA>){ my @words = split(":"); foreach my $word (@words){ if($word=~/($namecnt)/io){ $count{$1}++; } } } foreach my $word (sort keys %count) { printf("(STDOUT) %39s %-14s %-19s %6s", "There are", $count{$wo +rd}, $word, "Name(s)\n"); print "(OUTPUT) There are $count{$word} $word Name(s)\n"; } __DATA__ 1:NAME:Bob:Bobville:Phone 2:NAME:Dave:Davis:Phone 3:NAME:Will:Willard:Phone 4:NAME:Todd:Toadlane:Phone
(STDOUT) There are 1 Dave Name(s)
(OUTPUT) There are 1 Dave Name(s)
(STDOUT) There are 2 Will Name(s)
(OUTPUT) There are 2 Will Name(s)

Replies are listed 'Best First'.
Re^5: Word Count and Match
by choroba (Archbishop) on Jan 08, 2021 at 13:44 UTC
    When matching, use anchors in the regex to only match the whole word:
    my $namecnt = qr/^(David|Tom|Sam|Will|Dave|William|Thomas)$/;
    ^ matches at the beginning, $ matches at the end.

    The match will now look like (I removed the /o as it's not recommended)

    if ($word =~ /$namecnt/i) {

    Note that the first argument to split is a regex. It's clearer to write

    my @words = split /:/;

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re^5: Word Count and Match
by LanX (Cardinal) on Jan 08, 2021 at 13:41 UTC
    > As you can see it lists the names twice instead of just once.

    Do you understand the effects of print and printf?

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Re^5: Word Count and Match
by davido (Cardinal) on Jan 08, 2021 at 19:58 UTC

    No, there is no problem with the code's output. Your original code was printing using printf to STDOUT, and it was printing via print to the OUTPUT file handle. Two distinct destinations; one the terminal, the other a file. For the purposes of this demonstration I switched it so that both print to STDOUT, because printing to a file adds complexity to the example that you don't need. But in the print output I identify which output stream your original code would have printed to. So it is intentional that we are getting output twice. As you can see it is not identical output. And as you can see from the code, it is because we have a printf statement, and a print statement, just as your original code did.

    So in other words, you still haven't shown us the code that is misbehaving, unless the misbehavior was that you were printing to STDOUT and to an output file because you have two print statements. Do you need for us to suggest removing one of the two print statements?


Re^5: Word Count and Match
by eyepopslikeamosquito (Bishop) on Jan 08, 2021 at 15:51 UTC

    I see you're still using the /o modifier in:

    Did you bother to read and understand my earlier reply?

Re^5: Word Count and Match
by Marshall (Canon) on Jan 09, 2021 at 20:16 UTC
    The first problem that I had with your code was the confusing name, $namecnt. I expected a numeric scalar for that type of name!
    I changed that var name to "$names2count" to imply multiple names which will be counted - I assume in a case insensitive manner.
    Added: William test case
    use strict; use warnings; my %name_count; my $names2cnt='David|Tom|Sam|Will|Dave|William|Thomas'; while (my $line = <DATA>) { next unless ($line =~ /\S/); #skip blank lines my $name = (split (":",$line))[2]; $name_count{$1}++ if $names2cnt =~ /\b($name)\b/i; } foreach my $name (sort keys %name_count) { print "$name => $name_count{$name}\n"; } =prints: Dave => 1 Will => 3 #allows Will and WILL and WiLL spellings William => 1 =cut __DATA__ 1:NAME:Bob:Bobville:Phone 2:NAME:Dave:Davis:Phone 3:NAME:Will:Willard:Phone 4:NAME:Todd:Toadlane:Phone 5:NAME:WILL:Street:Phone 6:NAME:WiLL:Street2:phone2 7:NAME:WilliaM:xyz:1234
    Update: I'm not sure that this \b stuff in the regex is necessary. I put some obvious test cases into the code, but not all possible test cases.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11126600]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2021-04-12 13:52 GMT
Find Nodes?
    Voting Booth?

    No recent polls found