astrobio has asked for the wisdom of the Perl Monks concerning the following question:
I have to fill in missing values in a sparse matrix of 50 columns and millions of rows. The missing values are filled importantly from the next good value in a column. The physical problem is from a signal generator where each column has a different sample rate. Any suggestions on how to make the following fragile code more robust to backfill missing values?
It seems possible to do with just one data pass if the missing values are filled from the last good value only, and the first row was guaranteed filled to start.
which currently gives outputwhile (<DATA>){ s/^\s+|\s+//g; if(length($_)<1){ next;} @f=split(/,/,$_); push(@col1,{TIME=>$f[0],GOOD=>$f[1]}); push(@col2,{TIME=>$f[0],GOOD=>$f[2]}); if($unique{$f[0]}!=1){ push(@times,$f[0]); $unique{$f[0]}=1; } } # reverse sort by col1 values to backfill # forward sort by col1 time to catch last good foreach(sort {$$b{GOOD}<=>$$a{GOOD} || $$a{TIME}<=>$$b{TIME}} @col1){ if( ($$_{TIME}<$last_time) && ($$_{GOOD} == -999.9) ) { $out1{$$_{TIME}}=$last_good; }elsif($$_{GOOD} == -999.9){ $out1{$$_{TIME}}=$last_good; }else{ $last_time=$$_{TIME}; $last_good=$$_{GOOD}; $out1{$$_{TIME}}=$$_{GOOD}; } } foreach(sort {$$b{GOOD}<=>$$a{GOOD} && $$a{TIME}<=>$$b{TIME}} @col2){ if( ($$_{TIME}<$last_time) && ($$_{GOOD} == -999.9) ) { $out2{$$_{TIME}}=$last_good; }elsif($$_{GOOD} == -999.9){ $out2{$$_{TIME}}=$last_good; }else{ $last_time=$$_{TIME}; $last_good=$$_{GOOD}; $out2{$$_{TIME}}=$$_{GOOD}; } } foreach $time (@times){ print "$time,$out1{$time},$out2{$time}\n"; } # time,col1,col2 # (-999.9 out-of-range to indicate missing # and distinguish from true zeroes) __DATA__ 0.01,-999.9,1 0.02,-999.9,-999.9 0.03,-999.9,3 0.04,2,-999.9 0.05,-999.9,-999.9 0.06,5,4
# time,col1,col2 0.01,2,1 0.02,2,1 0.03,2,3 0.04,2,1 0.05,2,1 0.06,5,4
It seems possible to do with just one data pass if the missing values are filled from the last good value only, and the first row was guaranteed filled to start.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: How best to fill missing values in a sparse matrix?
by BrowserUk (Patriarch) on Mar 22, 2011 at 05:34 UTC | |
Re: How best to fill missing values in a sparse matrix?
by ELISHEVA (Prior) on Mar 22, 2011 at 07:18 UTC | |
Re: How best to fill missing values in a sparse matrix?
by wind (Priest) on Mar 22, 2011 at 09:56 UTC |
Back to
Seekers of Perl Wisdom