Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: Word Count and Match

by PilotinControl (Pilgrim)
on Jan 07, 2021 at 21:37 UTC ( [id://11126567]=note: print w/replies, xml ) Need Help??


in reply to Re: Word Count and Match
in thread Word Count and Match

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re^3: Word Count and Match
by davido (Cardinal) on Jan 07, 2021 at 22:50 UTC

    Your sample code is not a short self-contained snippet that demonstrates the behavior you describe. Your data is not part of the code, and the code that is relevant is incorporated in a subroutine that your example never calls, and it has external dependencies that aren't needed, and that aren't loaded by the snippet. That means anyone wanting to respond to help you has to refactor the code so that it can run stand-alone, and in doing that, anyone trying to help you might inadvertently fix the thing that you're describing as working incorrectly. Precision in these sorts of things is important. We shouldn't be chasing you down to demonstrate the actual bug to us in a way that we can repeat in our own tests.

    I took a stab at doing this; I pulled the data into a __DATA__ segment, removed the need for external files, and caused your output to print entirely to STDOUT, but preserved in the output the names of the handles you were printing to. I also removed the colorization, since it is an external dependency that you didn't "use" in your snippet. So I assume it's not part of the problem. Additionally, I added the formatting back in that I needed to be able to understand your code. Having done all that stuff that you should have done, this is what I came up with:

    #!/usr/bin/env perl use strict; use warnings; my %count; my $namecnt='David|Tom|Sam|Will|Dave|William|Thomas'; while(<DATA>){ my @words = split(":"); foreach my $word (@words){ if($word=~/($namecnt)/io){ $count{$1}++; } } } foreach my $word (sort keys %count) { printf("(STDOUT) %39s %-14s %-19s %6s", "There are", $count{$wo +rd}, $word, "Name(s)\n"); print "(OUTPUT) There are $count{$word} $word Name(s)\n"; } __DATA__ 1:NAME:Bob:Phone 2:NAME:Dave:Phone 3:NAME:Will:Phone 4:NAME:Todd:Phone

    And when I run that I get:

    (STDOUT) There are 1 Dave + Name(s) (OUTPUT) There are 1 Dave Name(s) (STDOUT) There are 1 Will + Name(s) (OUTPUT) There are 1 Will Name(s)

    Which, to me, is an indication that the code is behaving as designed, and that you are NOT getting some summary count at the end like you said you are getting. At least not from the code you provided.

    So where are we now? You've asked a question, people said that the original question didn't demonstrate the problem being described. You took a stab at providing a better example of the code, but still failed to demonstrate that there is actually a problem with the code you posted. If I had to guess, I would say you have a print statement somewhere that you have forgotten about. Either way, people who are just trying to help had their willingness to help squandered.


    Dave

      > Either way, people who are just trying to help had their willingness to help squandered.

      He has always been like this, and always got similar replies.

      And next time will be a déjà vu again.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        I know. I keep hoping, given the number of years, things will be better.


        Dave

      @Dave: There actually is a problem with the code. I've modified the code below and the output. As you can see it lists the names twice instead of just once. That's why I wanted to match the data exactly using the specific field in the file.

      #!/usr/bin/env perl use strict; use warnings; my %count; my $namecnt='David|Tom|Sam|Will|Dave|William|Thomas'; while(<DATA>){ my @words = split(":"); foreach my $word (@words){ if($word=~/($namecnt)/io){ $count{$1}++; } } } foreach my $word (sort keys %count) { printf("(STDOUT) %39s %-14s %-19s %6s", "There are", $count{$wo +rd}, $word, "Name(s)\n"); print "(OUTPUT) There are $count{$word} $word Name(s)\n"; } __DATA__ 1:NAME:Bob:Bobville:Phone 2:NAME:Dave:Davis:Phone 3:NAME:Will:Willard:Phone 4:NAME:Todd:Toadlane:Phone
      (STDOUT) There are 1 Dave Name(s)
      (OUTPUT) There are 1 Dave Name(s)
      (STDOUT) There are 2 Will Name(s)
      (OUTPUT) There are 2 Will Name(s)

        When matching, use anchors in the regex to only match the whole word:
        my $namecnt = qr/^(David|Tom|Sam|Will|Dave|William|Thomas)$/;
        ^ matches at the beginning, $ matches at the end.

        The match will now look like (I removed the /o as it's not recommended)

        if ($word =~ /$namecnt/i) {

        Note that the first argument to split is a regex. It's clearer to write

        my @words = split /:/;

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        > As you can see it lists the names twice instead of just once.

        Do you understand the effects of print and printf?

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        No, there is no problem with the code's output. Your original code was printing using printf to STDOUT, and it was printing via print to the OUTPUT file handle. Two distinct destinations; one the terminal, the other a file. For the purposes of this demonstration I switched it so that both print to STDOUT, because printing to a file adds complexity to the example that you don't need. But in the print output I identify which output stream your original code would have printed to. So it is intentional that we are getting output twice. As you can see it is not identical output. And as you can see from the code, it is because we have a printf statement, and a print statement, just as your original code did.

        So in other words, you still haven't shown us the code that is misbehaving, unless the misbehavior was that you were printing to STDOUT and to an output file because you have two print statements. Do you need for us to suggest removing one of the two print statements?


        Dave

        I see you're still using the /o modifier in:

        if($word=~/($namecnt)/io){
        Did you bother to read and understand my earlier reply?

        The first problem that I had with your code was the confusing name, $namecnt. I expected a numeric scalar for that type of name!
        I changed that var name to "$names2count" to imply multiple names which will be counted - I assume in a case insensitive manner.
        Added: William test case
        use strict; use warnings; my %name_count; my $names2cnt='David|Tom|Sam|Will|Dave|William|Thomas'; while (my $line = <DATA>) { next unless ($line =~ /\S/); #skip blank lines my $name = (split (":",$line))[2]; $name_count{$1}++ if $names2cnt =~ /\b($name)\b/i; } foreach my $name (sort keys %name_count) { print "$name => $name_count{$name}\n"; } =prints: Dave => 1 Will => 3 #allows Will and WILL and WiLL spellings William => 1 =cut __DATA__ 1:NAME:Bob:Bobville:Phone 2:NAME:Dave:Davis:Phone 3:NAME:Will:Willard:Phone 4:NAME:Todd:Toadlane:Phone 5:NAME:WILL:Street:Phone 6:NAME:WiLL:Street2:phone2 7:NAME:WilliaM:xyz:1234
        Update: I'm not sure that this \b stuff in the regex is necessary. I put some obvious test cases into the code, but not all possible test cases.
Re^3: Word Count and Match
by eyepopslikeamosquito (Archbishop) on Jan 07, 2021 at 22:00 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11126567]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (6)
As of 2024-04-24 08:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found