Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: incrementing already existing file

by roboticus (Chancellor)
on Feb 28, 2011 at 00:42 UTC ( [id://890482]=note: print w/replies, xml ) Need Help??


in reply to incrementing already existing file

wanttoprogram:

If you keep your indentation and other whitespace consistent and clean, it's easier to see errors in your code. Additionally, I like to declare variables where I need them, and I don't create variables I don't use. ;^) (In other words, I deleted a few variables that you weren't using.) So I altered your code a bit, like this:

#!/usr/bin/perl -w use strict; open (MYFILE, "2hgs_d00_internal_nrg_e.dat"); open (NEWF, "2HGS_bio_conv-min_p.pdb"); while (<MYFILE>) { chomp; # avoid \n at the end of each line if ($_ =~/ENERGY/) { for(my $count=1;$count<=1;$count++){ my $chn = substr $_, 20, 3; my $nrgval = substr $_, 35, 8; while (<NEWF>) { chomp; # avoid \n at the end of each line if ($_ =~/ATOM/){ for(my $count2=1;$count2<=1;$count2++){ my $chn2 = substr $_, 23, 3; my $toprint = substr $_, 0, 65; for($chn=1;$chn<=$chn2;$chn++){ if ($chn==$chn2){ print " $toprint $nrgval \n"; } } } } } } } }

Having done that, it's a bit easier to see why you get only one value from MYFILE. You read from it in the outermost loop, and then process the entire NEWF file. Then, when it's time to read the second record from MYFILE, your NEWF is empty, so it completely skips the inner loop from then on.

Generally, if your code just creeps rightward like this, it's indicative of a problem of some sort. I'm normally uncomfortable with more than, say, four levels of indentation. Beyond that, I tend to either change my logic, or pull out some subroutines to simplify things.

One last thing: You have some strange loops in the form:

for(count2=1;$count2<=1;$count2++){ #stuff }

You know that the loop should execute only one time, right? I would normally assume you meant something else and just keyed in the wrong thing. But since you have it repeated I thought I'd point it out to you. For example, try this program out:

#!/usr/bin/perl for (my $count2=1; $count2<=1; $count2++) { print "Count2: $count2\n"; }

You should review how loops work, and then change the logic to do more of what you want. If you're wanting to work through both files in parallel, you might want to check out the logic in Re: How to deal with Huge data and/or Re: parallel reading.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^2: incrementing already existing file
by wanttoprogram (Novice) on Feb 28, 2011 at 02:00 UTC
    I am getting only one value repeatedly as shown below. I want it repeat only till column is same and then give a different #. It should look like the next output. The output I am getting is:
    ATOM 1 N MET 1 4.440 25.987 -14.585 1.00 0.0 28.8 +216 ATOM 2 HT1 MET 1 4.524 25.860 -15.614 1.00 0.0 28. +8216 ATOM 3 HT2 MET 1 4.109 26.952 -14.383 1.00 0.0 28. +8216 ATOM 4 HT3 MET 1 3.729 25.316 -14.228 1.00 0.0 28. +8216 ATOM 5 CA MET 1 5.708 25.747 -13.831 1.00 0.0 28. +8216 ATOM 6 HA MET 1 6.008 24.725 -14.011 1.00 0.0 28. +8216 ATOM 7 CB MET 1 6.792 26.728 -14.367 1.00 0.0 28. +8216 ATOM 8 HB1 MET 1 6.812 26.659 -15.475 1.00 0.0 28. +8216 ATOM 9 HB2 MET 1 6.502 27.769 -14.109 1.00 0.0 28. +8216 ATOM 10 CG MET 1 8.241 26.517 -13.880 1.00 0.0 28. +8216 ATOM 11 HG1 MET 1 8.298 26.709 -12.787 1.00 0.0 28. +8216 ATOM 12 HG2 MET 1 8.547 25.462 -14.049 1.00 0.0 28. +8216 ATOM 13 SD MET 1 9.409 27.618 -14.738 1.00 0.0 28. +8216 ATOM 14 CE MET 1 10.824 27.291 -13.650 1.00 0.0 28. +8216 ATOM 15 HE1 MET 1 11.740 27.794 -14.026 1.00 0.0 28. +8216 ATOM 16 HE2 MET 1 10.631 27.662 -12.621 1.00 0.0 28. +8216 ATOM 17 HE3 MET 1 11.042 26.203 -13.587 1.00 0.0 28. +8216 ATOM 18 C MET 1 5.446 25.905 -12.332 1.00 0.0 28. +8216 ATOM 19 O MET 1 4.414 26.443 -11.925 1.00 0.0 28. +8216 ATOM 20 N ALA 2 6.330 25.384 -11.469 1.00 0.0 28. +8216 ATOM 21 HN ALA 2 7.105 24.825 -11.751 1.00 0.0 28. +8216 ATOM 22 CA ALA 2 6.383 25.717 -10.067 1.00 0.0 28. +8216 ATOM 23 HA ALA 2 6.344 26.791 -9.955 1.00 0.0 28. +8216 ATOM 24 CB ALA 2 5.300 25.034 -9.205 1.00 0.0 28. +8216 ATOM 25 HB1 ALA 2 4.288 25.319 -9.565 1.00 0.0 28. +8216 ATOM 26 HB2 ALA 2 5.394 23.928 -9.255 1.00 0.0 28. +8216 ATOM 27 HB3 ALA 2 5.396 25.346 -8.143 1.00 0.0 28. +8216 ATOM 28 C ALA 2 7.753 25.238 -9.659 1.00 0.0 28. +8216 ATOM 29 O ALA 2 8.299 24.357 -10.317 1.00 0.0 28. +8216 ATOM 30 N THR 3 8.353 25.813 -8.605 1.00 86.2 28. +8216 ATOM 31 HN THR 3 7.908 26.533 -8.079 1.00 0.0 28. +8216 ATOM 32 CA THR 3 9.687 25.408 -8.176 1.00 88.8 28. +8216 ATOM 33 HA THR 3 9.829 24.356 -8.373 1.00 0.0 28. +8216 ATOM 34 CB THR 3 10.847 26.194 -8.810 1.00 91.6 28. +8216 ATOM 35 HB THR 3 11.790 25.982 -8.261 1.00 0.0 28. +8216 ATOM 36 OG1 THR 3 10.614 27.598 -8.833 1.00 93.2 28. +8216 I should get output : ATOM 1 N MET 1 4.440 25.987 -14.585 1.00 0.0 28.8 +216 ATOM 2 HT1 MET 1 4.524 25.860 -15.614 1.00 0.0 28. +8216 ATOM 3 HT2 MET 1 4.109 26.952 -14.383 1.00 0.0 28. +8216 ATOM 4 HT3 MET 1 3.729 25.316 -14.228 1.00 0.0 28. +8216 ATOM 5 CA MET 1 5.708 25.747 -13.831 1.00 0.0 28. +8216 ATOM 6 HA MET 1 6.008 24.725 -14.011 1.00 0.0 28. +8216 ATOM 7 CB MET 1 6.792 26.728 -14.367 1.00 0.0 28. +8216 ATOM 8 HB1 MET 1 6.812 26.659 -15.475 1.00 0.0 28. +8216 ATOM 9 HB2 MET 1 6.502 27.769 -14.109 1.00 0.0 28. +8216 ATOM 10 CG MET 1 8.241 26.517 -13.880 1.00 0.0 28. +8216 ATOM 11 HG1 MET 1 8.298 26.709 -12.787 1.00 0.0 28. +8216 ATOM 12 HG2 MET 1 8.547 25.462 -14.049 1.00 0.0 28. +8216 ATOM 13 SD MET 1 9.409 27.618 -14.738 1.00 0.0 28. +8216 ATOM 14 CE MET 1 10.824 27.291 -13.650 1.00 0.0 28. +8216 ATOM 15 HE1 MET 1 11.740 27.794 -14.026 1.00 0.0 28. +8216 ATOM 16 HE2 MET 1 10.631 27.662 -12.621 1.00 0.0 28. +8216 ATOM 17 HE3 MET 1 11.042 26.203 -13.587 1.00 0.0 28. +8216 ATOM 18 C MET 1 5.446 25.905 -12.332 1.00 0.0 28. +8216 ATOM 19 O MET 1 4.414 26.443 -11.925 1.00 0.0 28. +8216 ATOM 20 N ALA 2 6.330 25.384 -11.469 1.00 0.0 24. +9274 ATOM 21 HN ALA 2 7.105 24.825 -11.751 1.00 0.0 24. +9274 ATOM 22 CA ALA 2 6.383 25.717 -10.067 1.00 0.0 24. +9274 ATOM 23 HA ALA 2 6.344 26.791 -9.955 1.00 0.0 24. +9274 ATOM 24 CB ALA 2 5.300 25.034 -9.205 1.00 0.0 24. +9274 ATOM 25 HB1 ALA 2 4.288 25.319 -9.565 1.00 0.0 24. +9274 ATOM 26 HB2 ALA 2 5.394 23.928 -9.255 1.00 0.0 24. +9274 ATOM 27 HB3 ALA 2 5.396 25.346 -8.143 1.00 0.0 24. +9274 ATOM 28 C ALA 2 7.753 25.238 -9.659 1.00 0.0 24. +9274 ATOM 29 O ALA 2 8.299 24.357 -10.317 1.00 0.0 24. +9274 ATOM 30 N THR 3 8.353 25.813 -8.605 1.00 86.2 19. +0884 ATOM 31 HN THR 3 7.908 26.533 -8.079 1.00 0.0 19. +0884 ATOM 32 CA THR 3 9.687 25.408 -8.176 1.00 88.8 19. +0884 ATOM 33 HA THR 3 9.829 24.356 -8.373 1.00 0.0 19. +0884 ATOM 34 CB THR 3 10.847 26.194 -8.810 1.00 91.6 19. +0884 ATOM 35 HB THR 3 11.790 25.982 -8.261 1.00 0.0 19. +0884 ATOM 36 OG1 THR 3 10.614 27.598 -8.833 1.00 93.2 19. +0884
      It looks like both files use column 5 as a "key" of sorts to connect the two files. I would approach this by reading all of the first file (the one you open as MYFILE), collecting the values from the last column along the way. Since you only need to collect one value from each line, I save then in an array as I read the file. This will work fine even for fairly large files. When the first file is processed, read from the second file (the one you open as NEWF) and do the substitutions (line by line), writing the output as we go.
      #!/usr/bin/env perl use strict; use warnings; my $file1 = "pm-890461-in1.txt"; my $file2 = "pm-890461-in2.txt"; open( MYFILE, '<', $file1 ) or die "cannot open $file1: $!"; open( NEWF, '<', $file2 ) or die "cannot open $file2: $!"; my @in_values; while ( <MYFILE> ) { chomp; my( $index, $value ) = ( split /\s+/ )[4, -1]; # above line does same thing as next three # my @fields = ( split /\s+/ ); # my $index = $fields[4]; # my $value = $fields[-1]; $in_values[ $index ] = $value; } close MYFILE; while ( <NEWF> ) { chomp; my @fields = ( split /\s+/ ); my $index = $fields[4]; $fields[-1] = $in_values[ $index ]; my $output = join "\t", @fields; print "$output\n"; } close NEWF;
      Note that I use split (not substr) to get the fields of interest from each line (same approach for both files). For the output, I join the fields with a tab character. You should change that to something else (e.g., a fixed number of space characters) if you need the output formatted differently. And of course this writes to STDOUT, so you will need to redirect the output on the command line or add to this code to open an output file and print to that.

      When you are more comfortable with Perl, you will find that some of this is actually on the "verbose" side. Using Perl idioms would make some of my code more compact, but also a bit harder to follow until you have more experience.

        Thank you very much. The code worked perfectly well. But I have one last issue. There are two sets of values 'A' and 'B'. The code you gave me is considering B set values only. Is there any way I can ask it to look for A first and then move to B. Thank you again. It was very helpful.

      wanttoprogram:

      OK, are both files sorted with respect to the key fields? If so, then you don't really want nested loops. You want a single loop and you can decide which file to read depending on what the current condition is. Something like:

      # "Prime the pump" my $rec1 = <FILE1>; my $rec2 = <FILE2>; # Keep looping as long as either file has records while (!eof(FILE1) or !eof(FILE2)) { # Figure out what keys you have my $key1 = get_key_1($rec1); my $key2 = get_key_2($rec2); if ($key1 eq $key2) { # They're the same, so create an output record, and read # next record from file2 print build_record($rec1, $rec2); $rec2 = <FILE2>; } elsif ($key1 lt $key2) { # First file has a key we don't need, just ignore # it and read the next record $rec1 = <FILE1>; } else { # Hmmm ... first file seemed to skip the key we need. # print a partial record and advance to next file2 record print partial_record($rec2); $rec2 = <FILE2>; } }

      Of course, if either of the files aren't sorted on the keys, then that won't work. You'll either have to sort them, or try something like a hash table. For the hash table, you simply read the first file into a hash based on the key field(s). Then you scan through the second file, looking up values from the hash as you need them. Something like:

      # Read dictionary my %abbreviations; while (my $line = <DATA>) { my ($abbrev,$longname) = split/:/, $line; $abbreviations{$abbrev}=$longname; } # Process file open my $FH, '<', 'the_file' or die; while (my $line = <$FH>) { my ($field1, $field2, $key, $field3) = split /\t/, $line; if (exists $abbreviations{$key}) { # key was abbreviated, replace with full value $key = $abbreviations{$key}; } print "$key: ($field1, $field2, $field3)\n"; } close $FH; __DATA__ perl:pathologically eclectic rubbish lister lisp:lots of irritating silly parenthesis python:all your space are belong to us ruby:a quack language

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://890482]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (4)
As of 2024-04-19 16:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found