http://qs321.pair.com?node_id=1141980


in reply to Dear Venerable Monks

Perhaps an attempt at a possible rewrite of the beginning of your code, just to give an example of modern best practices:
#!/usr/bin/perl use strict; use warnings; my $CHR_NUM = $ARGV[0]; my $file_location = "/home/ALL.chr$CHR_NUM.phase1_release_v3.20101123. +snps_indels_svs.genotypes.vcf.gz"; my @id_files; # you might find a better variable name for th +at array of arrays for my $x (1..75) { # $x is a poor name for a variable, perhaps $f +ile_nr would be better my $file_in = "/home/raj/HAPLOTYPES_JAN_2015/PROG_1_Approach_2 +/C22-$x"; open my $FH, $file_in or die "Cannot open $file_in $!"; # if o +ne fails, you might be happy to know which one of the 75 files while (my $line = <$FH>){ chomp $line; push @{$id_files[$x]}, $line; } close $FH; }
Besides adding use strict; and use warnings;, which in turns forces me to declare all variables (with the my keyword here), the main change is to use an array of arrays to store the data read in your 75 files. You'll end up with something looking like that:
0 ARRAY(0x600500ae8) 0 empty slot # $x starts as 1, so no element for subscript 0 1 ARRAY(0x600500c68) 0 'line1 file 1' 1 'line2 file 1' 2 ' ... etc.) 2 ARRAY(0x600500b30) 0 'line 1 file 2' 2 ' ... etc. ... etc. 75 ARRAY(0x6005.....) 0 'line 1 file 75' 2 ' ... etc.
I hope this makes the structure clear, don't hesitate to ask if you have any difficulty using such an array of arrays. The basic idea, if you need to read line 2 of the array corresponding to $x is this:
print $id_files[$x][2];

I am not going any further i your code because I don't understand your code, we would really need to see samples of the input data to understand what you do.

Please use meaningful variable names, this will make your life much easier.

One last comment. This "goto"! Besides the fact that using such goto is usually very much frown upon in general and has been frown upon for about 40 to 45 years (I have never felt I needed to use one in Perl, well, at least not this form of goto which tends to break of the tenets of structured programming), it seems to make little sense to set $choice to 0 and, immediately thereafter, to gototo a place where you test whether $choice is equal to 1. This really looks like a design defect. If you just need to exit all the nested loops, there are better ways to control that, using last or sometimes next, in this case probably with a label.

I hope this helps.

Replies are listed 'Best First'.
Re^2: Dear Venerable Monks
by A1 Transcendence (Initiate) on Sep 14, 2015 at 22:01 UTC
    Thank You Laurent and Everyone else. All of your suggestions are very helpful and have already improved my abilities. I will get samples of the data input and rewrite my code before i post again. TYVM

      Are the IDs in the 75 files unique to each file or does the same ID appear in more than 1 file ?

      I assume the genotypes.vcf.gz file is several GB's, is that correct ?

      poj