Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
The problems are numerous. A line-by-line walkthrough is needed to cover them. Since you have persevered, and desire to improve your Perl skills, here is that walkthrough:
  • #!/usr/bin/perl
    This line needs to be followed with these two lines:
        use strict;
        use warnings;
    The "strict" and "warnings" lines turn on error checking for common mistakes, and force you to declare your variables.
  • print "reading 'ansvarig_vardenhet_utf8.txt'\n";
    Google says that "Ansvarig vardenhet" is Swedish for "Responsible care unit". If that is true, then this print statement (and its twin further down) is lying. They are not reading 'ansvarig_vardenhet_utf8.txt'; they are reading 'icd-10-codes.txt' and 'female1.txt'. If you just need reassurance about what point your program has reached, then use something like print "P1\n"; and print "P2\n";. Otherwise, include the actual file names. Do not lie; it will just confuse the next person who works on the program.
  • open(WIN, "icd-10-codes.txt");
    As written, if the ICD file is missing, then your program will seem to run correctly, but will silently fail to do any replacements. Silent failure is not acceptable. die() is used to halt a program on a critical error; it is often coupled with or so that the die happens when the first part of a statement fails. The $! variable contains the reason for the failure.
         open(WIN, "icd-10-codes.txt") or die "Cannot open file: $!"; .
    Some other changes that would be improvements, but are not mandatory:
    • You are not telling Perl whether to read or write the file, so it defaults to read. It is better to be explicit.
          open( WIN, '<', "icd-10-codes.txt" )
    • WIN is a "bareword" filehandle. We now encourage the use of a real variable for the filehandle.
          open( my $win_fh, '<', "icd-10-codes.txt" )
    • If you use a variable to hold the filename, then you can locate the actual filename at the top of the program, where such configurable info is usually located. It will also allow you to put the filename in the "die" error message, and in the "reading ansvarig_vardenhet_utf8.txt" line already discussed, without repeating yourself.
          my $icd_file = 'icd-10-codes.txt';
          #print "Reading '$icd_file'\n";
          open( my $win_fh, '<', $icd_file ) or die "Cannot open '$icd_file': $!";
    By the way, the missing "strict" and "warnings" lines, and the missing '<' in "open", tell me that you are learning Perl from outdated material. I highly recommend Learning Perl or Modern Perl (or its free PDF).
  • while($line = <WIN>) {
    Since we added "use strict", we now have to declare all variables, such as $line. Also, $line needs to have its newline removed, otherwise the ICD description will have the newline embedded in it, which will mis-format your replacements. The chomp() function removes a trailing newline if one is present.
        while ( my $line = <WIN> ) {
            chomp $line;
  • @columns = split(/\t/, $line);
    Since each line has exactly two fields, you should name them. You can split() any extra fields into a generic array, to help find any future problems with the file formatting.
        my ($lookup_code, $icd_code_and_text, @rest) = split /\t/, $line;
        warn "Extra tab in file '$icd_file at line $.\n" if @rest;
  • #print "$columns[1]";
    That line is clearly for debugging, but if you use it, the output will be indistinct from its neighbors. You need a terminating newline.
        #print "$columns[1]\n";
  • foreach my $value (@columns) { #print "$value\n" }
    When you need to inspect more than a single simple variable, use Data::Dumper. In fact, use it to inspect even single simple variables, too.
        # This "use" line goes at the top, just after "use warnings".
        use Data::Dumper; $Data::Dumper::Useqq=1;

        print Dumper \@columns;
    Also, your closing brace is not lined up with the 'f' in 'foreach'. The Perl compiler does not care, but alignment will help any human trying to read your code.
  • close(WIN);
    This would need "or die" if it was anything more complex than *reading* a file on a *local* filesystem. It is fine as-is.

Note that you have not preserved *any* data from the ICD file, so no ICD lookups can be done below this point! We will discuss how to fix this later, by copying the needed info to a hash or array.

(OK, before I added the my before @columns, you did technically preserve *some* ICD data, but since it was only the data from the last line in the file, it hardly counts.)

  • print "reading 'ansvarig_vardenhet_utf8.txt'\n"; open(IN, "female1.txt"); while($line = <IN>) {
    Fix as described above, including chomp().
  • %col = split(/ /, $line);
    No. This is nonsense. It would only make sense if you had a file with a single line, which contained all the keys and values separated by spaces. Instead of a hash, we need an array (declared, of course).
        my @cols = split / /, $line;
  • #print "$columns[1]\n";
    Use Data::Dumper.
  • foreach my $val (%col) {
    "for" is easier to type than "foreach", and either of them work. We have already changed from hash to array.
        for my $val (@cols) {
  • if($val == $columns[0]){ $val =~ s/$val/$columns[1]/g;
    The == operator compares two vars *numerically*. The eq operator compares two vars as strings, so it is the one you want. In other words, "3" == "3.0", but "3" ne "3.0". Also, if you have just determined that $val is an exact match, then s/// is overkill; just replace $var by assigning to it.
        if ( $val eq $columns[0] ) {
            $val = $columns[1];
    Note that, even with those changes, this code will not work, because @columns does not contain the whole table of ICD translations. Even if it did contain them, you would need *two* loops, nested, for this method to work. That would be needlessly complex and inefficient. Later, we will use a simple hash instead of @columns and two loops.
  • print IN ">>$val\n";
    This looks confused. print IN $anything_at_all will write to the IN file. Are you trying to overwrite the data in 'female1.txt'? Luckily, Perl will not let you, because you opened the file as read-only. Your original description says "...should be replaced with ICD-code files corresponding...", but you are not outputting the entire line. Is the '>>' a debugging marker, or does it indicate that you are trying to append to a file? Going by your original description, I think you need:
         print join( ' ', @columns ), "\n";
  • close(IN);
    Again, fine when reading locally.
  • I am a beginner in PERL.
    You are a beginner in Perl. (See PERL as shibboleth and the Perl community.)
    Welcome to the Perl Community!

The problem with your *approach* is just the failure to preserve the ICD data from the first half of the program so that the second half can use it. This can be done by declaring %lookup_icd above the first while() loop, and then within the while() loop, use:
    $lookup_icd{$icd_code} = $icd_code_and_text;

With a few more additions, it would look like this:

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; $Data::Dumper::Useqq = 1; my $icd_path = 'icd-10-codes.txt'; my $in_path = 'female1.txt'; my $out_path = 'female1_corrected.txt'; open my $icd_fh, '<', $icd_path or die "Cannot open '$icd_path': $!"; #print "Reading '$icd_path'\n"; my %lookup_icd; while ( my $line = <$icd_fh> ) { chomp $line; my ( $lookup_code, $icd_code_and_text ) = split /\t/, $line; # print Dumper $lookup_code, $icd_code_and_text; if ( exists $lookup_icd{$lookup_code} ) { warn "Replacing $lookup_code;" . " was '$lookup_icd{$lookup_code}'," . " now '$icd_code_and_text'\n"; } $lookup_icd{$lookup_code} = $icd_code_and_text; } close $icd_fh; #print Dumper \%lookup_icd; #print "Reading '$in_path'\n"; #print "Writing '$out_path'\n"; open my $in_fh, '<', $in_path or die "Cannot open '$in_path': $!"; open my $out_fh, '>', $out_path or die "Cannot open '$out_path': $!"; while ( my $line = <$in_fh> ) { chomp $line; my @cols = split / /, $line; for my $possible_icd (@cols) { my $replacement_icd = $lookup_icd{$possible_icd}; if ($replacement_icd) { $possible_icd = $replacement_icd; } } print {$out_fh} join( ' ', @cols ), "\n"; } close $in_fh; close $out_fh or warn "Cannot close '$out_path': $!\nSome data may not have been + written.";

I strongly recommend spending some time with either of the books I mentioned above. You have ventured a little past "baby Perl", and so now will be better served by tutorial than by blind exploration.


In reply to Re: comparing file contents by Util
in thread comparing file contents by kumaar1986

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-04-25 07:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found