Hello ken, thanks for explaining in your reply. It makes abit more sense for me now!
Yes to the following:
•You have multiple, tab-delimited files
•The first line of each file contains column headers
•Each file may have a different number of columns
However, I do want to keep the first column. I have columns that contain dataR(X) (e.g. dataR1, dataR2...dataR28) and then followed by several links (contained in several columns..some rows will be empty.) which I also want to keep
So right now, my problem here is trying to find the header that match dataS0XRx so that I can grab those columns to perform some calculations:
e.g.
first file.txt:
ID dataS01R1 dataS01R2 dataS02R1 dataS02R2 Links
M45 345.2 536 876.12 873 http://..
M34 836 893 829 83.234
M72 873 123 342.36 837
M98 452 934 1237 938 http://..
===================================================
Calculation:
row2/row2, row3/row2, row4/row2...row3400/row2
row2/row3, row3/row3, row4/row3 ... row3400/row3
row2/row4, row3/row4 ...row3400/row4
E.g dataS01R1
become:
ID dataS01R1 ..dataS01R02... Links
M45 1 (345.2/345.2) http://..
M34 2.42 (836/345.2)
M72 2.52 (873/345.2)
M98 1.309 (452/345.2) http://..
M45 0.41 (345.2/836) http://..
M34 1 (836/836)
M72 1.04 (873/836)
M98 0.54 (452/836) http://..
.
. (loop through rows as denominator)
.
and then loop through the column, print it out and filter off unwanted rows based on the average Coefficient Variance across all dataSXR0X rows (which I will figure out later after I manage to figure out the beginning part).
So my problem here:
How to find the column headers matching dataS0XR0X to put those columns into arrays for manipulation?
here is my code which I have done initially before posting into perlmonk:
if($first)
{
#if this is the first file, find the column locations
my $firstline = <CURINFILE>; #read in the header line
chomp $firstline;
my @columns = split(/\t/, $firstline);
my $columncount = 0;
while($columncount <= $#columns && !($columns[$columncount] =~
+ /ID/))
{
$columncount++;
}
$ID= $columncount;
while($columncount <= $#columns && !(($columns[$columncoun
+t] =~ /_dataS(\d+)R/) ))
{
$columncount++;
}
$intensitydata = $columncount;
#read in the remainder of the file
while(<CURINFILE>)
{
#add the id, intensity values to an array
chomp $_;
my @templine = split(/\t/,$_);
my @tempratio = ();
push(@tempratio, $templine[$ID]);
push(@tempratio, $templine[$intensitydata]);
print "\nWriting output...";
I tried this code initially (before changing to the code I posted in first post)but it doesn't print out anything so I do not know what's went wrong.
I am working on large databases and initially I worked with excel but it is too slow and lag my whole computer when performing calculations, so I decided to try PERL instead as I read that it is good for manipulating large datasets. However I am quite new to PERL, just started two months back. So I am not sure if what I am doing is okay. If there are other suggestions, let me know too.
I hope my explanation is not confusing. :)