Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

input data got lost in foreach loop?!

by FluffyBunny (Acolyte)
on Nov 04, 2010 at 22:19 UTC ( [id://869571]=perlquestion: print w/replies, xml ) Need Help??

FluffyBunny has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

I have some problems with troubleshooting my script.. Because my input files have so MANY lines, I had to make mock input files to make sure my output file looks good. So my program works well with my mock input files, but when I use large input files (lines about 360k), my foreach loop works for only 357k lines.. I did not see any problem with my original input files. I would very appreciate if you could locate the problem. Let me know if you need any further information. :)

## Script description: This program takes input and match the chromoso +me location from Bowtie parser file, # then adds the read # to the window (also includi +ng neighboring windows) ## Perl interpreter command # use warnings; use strict; ## Hashes initialized my %input; my %bowtieP; ## Open files # file1 - input: create window # from position #my $file1 = shift; # Input (chr # and position - base pair) open (FILE1, "/data/GAII/prostate_cells/SNP/Prostate_SNP_filtered.txt" +); my $head = <FILE1>; # If there is a header in the input #1, include th +is code while (<FILE1>) { chomp; my $input_orig = $_; my @line = split /\s+/, $_; my $chr = $line[1]; my $w = (int($line[2]/100))*100; # Window # created my $w1 = $w-100; my $w2 = $w+100; my $pos = "chr$chr\_$w"; # Get new variable chr_win for inpu +t $input{$pos}[0] = $input_orig; $input{$pos}[1] = $pos; $input{$pos}[2] = 0; # Read value initialized as 0 $input{$pos}[3] = $chr; $input{$pos}[4] = $w; $input{$pos}[5] = "chr$chr\_$w1"; $input{$pos}[6] = "chr$chr\_$w2"; $input{$pos}[7] = 0; # Read value for win-100 $input{$pos}[8] = 0; # Read value for win+100 } close FILE1; # file2 - bowtie parser output #my $file2 = shift; # Bowtie Parser Input file (chr #, window #, and r +ead value) open (FILE2, "/data/GAII/prostate_cells/DU145v2_Bow_Per100.txt"); while (<FILE2>) { chomp; my @line = split /\s+/, $_; my $chr = $line[0]; my $w = $line[1]; my $read = $line[3]; my $pos = "$chr\_$w"; # Get new variable chr_win for bowtie parse +r $bowtieP{$pos}[0] = $pos; $bowtieP{$pos}[1] = $read; } close FILE2; open (OUT, "> test.txt"); ### Change if file name changes print OUT "Name\tChr\tPosition\tGenTrain Score\tPrEC\tPrEC alleles\tRW +PE\tRWPE alleles\tLNCaP\tLNCaP alleles\tDU145\tDU145 alleles\tchr_win +\tread\tchr_win-100\tread-100\tchr_win+100\tread+100\ttotal\n"; foreach my $pos (keys %input) { if (exists $bowtieP{$input{$pos}[1]}[0]) { $input{$pos}[2]=$bowtieP{$pos}[1]; # Change read value from + input (initialized as 0) to the read value from bowtie parser data } if (exists $bowtieP{$input{$pos}[5]}[1]) { $input{$pos}[7] = $bowtieP{$input{$pos}[5]}[1]; # Change r +ead value from input (initialized as 0) to the read value from bowtie + parser data } if (exists $bowtieP{$input{$pos}[6]}[1]) { $input{$pos}[8] = $bowtieP{$input{$pos}[6]}[1]; # Change r +ead value from input (initialized as 0) to the read value from bowtie + parser data } my $total = $input{$pos}[2] + $input{$pos}[7] + $input{$pos}[8]; print OUT "$input{$pos}[0]\t$input{$pos}[1]\t$input{$pos}[2]\t$ +input{$pos}[5]\t$input{$pos}[7]\t$input{$pos}[6]\t$input{$pos}[8]\t$t +otal\n"; } close OUT; exit;

OMG IT'S PERL!!

~(o.o~) (~o.o)~

Perl and a blind date both require regular expression.. -_-

Replies are listed 'Best First'.
Re: input data got lost in foreach loop?!
by jethro (Monsignor) on Nov 04, 2010 at 23:24 UTC

    I won't even try to find errors in your script with such scant information, but I think I can still help you help yourself.

    Perl has a built-in debugger and that is just what you need now. You can find the details with 'perldoc perldebug'.

    The important thing, with the debugger you can single-step through the exact moment your bug happens and look at all the variables.

    To get to where the interesting things happen you need to set a breakpoint. For example if you know that your script always breaks after 34050 lines, add a breakpoint inside the loop with 'b <linenumber> ++$countxy > 34048' and just let it run with the 'c' command

    2 loop executions before your hot spot the script will stops at your breakpoint and you can inspect any variable with 'p <varname>' or nested data structure with 'x'. Then execute further step by step with 's' and watch what happens with 'p' or 'x' (also 'n' and 'r' help step over uninteresting parts).

    If you don't know the loop count but a condition that is true when your bad stuff happens, use that for a breakpoint. You also can use 'a' to make any checks while the program runs to first find out at what loop iteration the bad stuff happens, then restart and set a breakpoint shortly before that point

Re: input data got lost in foreach loop?!
by ikegami (Patriarch) on Nov 04, 2010 at 22:25 UTC

    Could you tell us what the problem is?

    Start by not hiding errors by reintroducing use warnings;.

Re: input data got lost in foreach loop?!
by aquarium (Curate) on Nov 05, 2010 at 00:13 UTC
    so i think this is not a perl/syntax bug per se or such..but you're just missing some expected output, which should be a 1 to 1 match for the number of lines on input file.
    since you probably cannot provide the production file for us to test with..the crux of the problem will obviously be in identifying exactly which lines are missing in output, and go from there. and to that end you should be able to either rig up something directly in your program or use utilities or such, to find which lines in the input file don't have a corresponding output line.
    i once had a curious missing output lines problem myself, during some data migrations for customer supplied data in text files. the perl script didn't see the problem at all, that there were some EOF markers right in the input files, before the end of the files. i confirmed the problem by "cat file | wc" which showed fewer input lines than was purported by data supplier, and used a hex editor to find and delete these early EOF markers. so the moral of it was that you can't assume a text file is a well formed text file, unless you generate it yourself to start with.
    the hardest line to type correctly is: stty erase ^H
Re: input data got lost in foreach loop?!
by biohisham (Priest) on Nov 05, 2010 at 14:37 UTC
    The first problem is commenting out the warnings invocation line. There maybe interesting cautionary remarks that you're removing from consideration by doing so.

    I did not see any problem with my original input files. I would very appreciate if you could locate the problem.
    The second problem in here is that we don't have the input files, this way you have removed a valuable insight into the possible sources of the problem, esp because you've mentioned that the code works fine for the sample files but not for the input files. If you can extract that portion of the input files that can reproduce this problem I am sure that there can be something that one of the Monks in here can provide.

    I work with huge biological data files myself and I get stuck aplenty like when I extend the code from my test sample case to application on all entries for that file and just as aquarium mentioned, there may lay some unscrupulous elements that couldn't be detected in the original files during your first pass on them and can be what is causing this problem, sometimes something as small as an empty space in the wrong place can just break apart your entire program and that you needed to account

    Examine your hashes by Data::Dumper to see if they contain what you expect them to in both cases for your test files and original files, read, How do I post a question effectively? and Perl and Bioinformatics for tips on how to ask questions with respect to biological data.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://869571]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-04-25 20:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found