Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

List comparison problem

by perlmonkster (Initiate)
on Aug 16, 2019 at 02:12 UTC ( [id://11104541]=perlquestion: print w/replies, xml ) Need Help??

perlmonkster has asked for the wisdom of the Perl Monks concerning the following question:

Hello

I somehow lost a final version of a simple Perl script to compare 2 lists, and cannot seem to figure out what is causing a problem with the version I have.

The simple .pl script takes ListL.txt, compares it to ListH.txt, and then "Flags" any entries from ListH.txt that are on ListL.txt (plus gives two separate Counts at the bottom of the output). Using two short sample lists, the Counts are both correct, but for some reason one of the ListL.txt items that *should* show a ListH.txt "Flag" in the output does not. I've tried switching around Count statements, etc., but am completely baffled. Any insight as to how to fix things would be greatly appreciated.

Here's the short code

use strict; use warnings; my %H_list; open my $H_list, '<', 'listH.txt' or die "Cannot open listH.txt: $!"; + while (my $line = <$H_list>) { chomp $line; $line =~ s/\r//g; # removes windows CR characters $line =~ s/\s+$//; # removes trailing white spaces $H_list{$line} = 1 } close $H_list; my ($L_count, $H_count); open my $L_list, '<', 'listL.txt' or die "Cannot open listL.txt: $!"; + while (<$L_list>) { chomp; s/\r//; s/\s+$//; $L_count ++; print; $H_count ++ and print ' On List H' if exists $H_list{$_}; print "\n"; } print "List L UNIQUES: $L_count; FLAGGED From List H: $H_count \n";

Here are the two short Test Lists and Test output:

(ListL.txt) ABC123 DEF456 GHI789 (ListH.txt) ABC123 GHI789 (Test Output) ABC123 DEF456 GHI789 On List H List L UNIQUES: 3; FLAGGED From List H: 2

As you can see, ABC123 should be also "Flagged" as "On List H", and is driving me NUTS as to why not.

Thanks very much.

-perlmonkster

Replies are listed 'Best First'.
Re: List comparison problem
by swl (Parson) on Aug 16, 2019 at 02:46 UTC

    The problem is in the postfix increment operation.

    $H_count ++ and print ' On List H' if exists $H_list{$_};

    Use an if block (or similar) instead to avoid conditional dependence on $H_count under the increment operator. Others will be able able to explain the reasons in detail.

    use strict; use warnings; my %H_list; open my $H_list, '<', 'listH.txt' or die "Cannot open listH.txt: $!"; + while (my $line = <$H_list>) { chomp $line; $line =~ s/\r//g; # removes windows CR characters $line =~ s/\s+$//; # removes trailing white spaces $H_list{$line} = 1 } close $H_list; my ($L_count, $H_count); open my $L_list, '<', 'listL.txt' or die "Cannot open listL.txt: $!"; + while (<$L_list>) { chomp; s/\r//; s/\s+$//; $L_count ++; print; if (exists $H_list{$_}) { $H_count ++; print ' On List H'; } print "\n"; } print "List L UNIQUES: $L_count; FLAGGED From List H: $H_count \n";

    UPDATE: See for example node 776720.

Re: List comparison problem
by perlmonkster (Initiate) on Aug 16, 2019 at 03:35 UTC
    swl: I was working on this some more with no success, and then just read your solution. Now I can sleep tonight. THANK YOU VERY MUCH !!! -perlmonkster
      The reason of the problem is that when you run:
      $H_count ++ and ...
      for the first time, the post increment operator sets $H_count to 1 and returns 0, i.e. a false value. Therefore, the statement following the and operator is not executed. The next time you run the same post-increment statement, it will return 1 (and subsequently other true values) and it will work fine as shown in the following test under the Perl debugger:
      DB<1> $h++ and print "foo"; DB<2> $h++ and print "foo"; foo
      This would work fine with the pre-increment operator:
      DB<3> ++$i and print "foo"; foo
      The solution suggested by swl is probably better because there is no hidden surprise in it.

      As a side comment, please note that in these two code lines:

      $line =~ s/\r//g; # removes windows CR characters $line =~ s/\s+$//; # removes trailing white spaces
      the first line isn't useful, because the second code line will remove all trailing white spaces, including the \r Windows CR character.
        the first line isn't useful, because the second code line will remove all trailing white spaces, including the \r Windows CR character.

        While I take your point, they are not entirely equivalent. The difference is that the first line removes all the \r characters wherever they appear in the line. The second does not do that.

        use strict; use warnings; use Test::More tests => 2; my $have = "foo\rbar\rbaz\r\n"; my $want = "foobarbaz"; $have =~ s/\s+$//; isnt $have, $want, 'Not all carriage returns removed'; $have =~ s/\r//g; is $have, $want, 'All carriage returns removed';

        I've spent far too much time over the years fighting poorly-formed, non-compliant, randomly-encoded data originating from Windows to assume anything about the quality of such data. YMMV.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11104541]
Approved by haukex
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2024-04-19 23:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found