Dear monks,
How to delete a set of specific lines in a file which occurs twice(the only difference is .f or .r). A sample input and output is given below.
sample input:
input is:
A0000.f BG_c22
A000X.f BG_c5
A000X.r BG_c5
A002B.f BG_c38
A002B.r BG_c38
A003A.r BG_c38
A0082.r BG_c12
A00AS.f BG_c52
A00B9.f BG_c45
A00B9.r BG_c45
A00DK.f BG_c5
A00F0.f BG_c22
A00F0.r BG_c22
A00F3.f BG_c14
A00FX.f BG_c7
A00FX.r BG_c7
result shud be:
A0000.f BG_c22
A003A.r BG_c38
A0082.r BG_c12
A00AS.f BG_c52
A00DK.f BG_c5
A00F3.f BG_c14
The code i have written is:
#!/usr/bin/perl
use strict;
use warnings;
open(FILE,"pe_real_sample.txt") or die "cannot open";
open(OUTFILE,">last.output") or die "cannot open";
my @arr=<FILE>;
close(FILE);
my $prev1=0;my $prev2=0;my @dels;
foreach(@arr){
my @spl=split(".(f|r)",$_);
if($prev1 eq $spl[0] && $prev2 eq $spl[2]){
push(@dels,$spl[0]);
}
$prev1=$spl[0];$prev2=$spl[2];
}
my $del= join '|', map quotemeta, @dels;
my @arr1=grep !/$del/,@arr;
print OUTFILE "@arr1\n";
Well, everything is perfect as far as the file is small. the sample input which i have given works. but then, wen i feed a 10MB data(which contains 3.5 million lines), the program doesnt work, i mean it takes awfully a lot of time but never stops either with a bug or correct result. What do i do? plz help.
P.S: This is also with reference to the post "deleting a specific element from an array".