http://qs321.pair.com?node_id=181291

dimmesdale has asked for the wisdom of the Perl Monks concerning the following question:

Hi. I have a problem. I trying to write a program to do some stuff with excel, and I'm using SpreadSheet::WriteExcel. The problem, however, lies with the data structures somewhere, I think.

I have data that looks like this:

TS21CCCFN30NT 64.77758146 63.02739802 92.64926038 90.68286 +212 0.043905373 0.061903769 TS1BL 53.5822136 62.62041357 71.03715952 71.16929716 5. +76499E-05 0.000314962 TS2BL 71.59567647 73.35932057 94.41989575 91.04652518 6 +.57045E-05 0.000587082 TS3BL 58.14568074 66.51773348 83.63006331 81.97901606 7 +.04488E-05 0.000387897 TS5BL 59.19892796 67.52430356 94.26526376 93.67353814 7 +.75011E-05 0.000316832 TS7BL 66.55656211 68.29674759 93.10938298 90.82023765 6 +.69424E-05 0.000200692 TS9BL 58.95471459 68.27327459 73.35588307 75.75919131 7 +.10676E-05 0.000211444 TS13BL 71.58954203 78.41099584 92.95413775 93.01392565 +6.37719E-05 0.000246732 TS15BL 54.77861261 57.36971063 93.07097848 90.77079334 +6.88732E-05 0.000205062
The number that follows TS is the test subject (identified by that number). Following that is the condition (one of 13). The data needs to be sorted by groups. There are three: low-hour vfr, low-hour instruments, and high-hour instruments (corresponding to certain test subjects; specifics in code). For these, I want to find the average for each condition, and display it in Excel.

This is the chunk that's giving me some problems. (the error in the tite)

for(0..3) { #here ($sums[$i])->[$_][$j] += $vals[$_+1]; if ($ts >= 21) { $sums[$i]->[4][$j] += $vals[5]; $sums[$i]->[5][$j] += $vals[6]; } } #and here $totals[$i]->[$j]->[0]++;
Here's the complete code to see how the ADTs evolve. (This is just a very preliminary 'test' to see what will work before I start doing some other things, but suggestions for better data types would be appreciated).
#!/usr/bin/perl -w use strict; use Spreadsheet::WriteExcel; my @groups = ([2,3,7,8,9,15,18,21,22,23,24,25,26,27], # Low hour vfr, low hour instruments [4,10,12,16,17,19], [1,5,6,11,13,14,20]); # high hour instruments my @cats = ('BL','BLSTDGG','PR1','PRFN1','PRFN3','PRFN30','EBG1', 'EBGFN1','EBGFN3','EBGFN30','CCFN1','CCFN30','CCFN30NT'); my $file = 'C:\WINDOWS\Profiles\chemphysio\Desktop\Test data\TEST\comp +lete_ordered2.txt'; open (TABFILE,$file) or die "$file: $!"; # Create a new Excel workbook my $workbook = Spreadsheet::WriteExcel->new('TEST.xls'); my $worksheet = $workbook->addworksheet(); my (@avgs,@sums,@totals); my @data = <TABFILE>; my $row = 0; for my $r (0..2) { for my $c (0..3) { for my $cat (0..$#cats) { ($sums[$r])->[$c][$cat] = 0; if ($r == 0) { ($sums[$r])->[4][$cat] = 0; ($sums[$r])->[5][$cat] = 0; } } } } for my $i (0..2) { for my $cat (0..$#cats) { $totals[$i]->[$cat]->[0] = 0; if ($i == 0) { $totals[$i]->[$cat]->[1] = 0; $totals[$i]->[$cat]->[1] = 0; } } } for (@data) { chomp; my @vals = split('\t', $_); $vals[0] =~ s/.txt//; my ($ts,$cat) = $vals[0] =~ /TS(\d+)(.+)$/; my ($i,$j); # Some of the data has slight errors in it (these were # the filenames, though they represent the condition) $cat =~ s/CCCFN30NT/CCFN30NT/; $cat =~ s/BLSTDGS/BLSTDGG/; $cat =~ s/BLSTDGG?/BLSTDGG/; for(0..$#cats) { if ($cats[$_] =~ /$cat/) { $j = $_ } } for(@{$groups[0]}) { if ($ts == $_) { $i = 0 } } for(@{$groups[1]}) { if ($ts == $_) { $i = 1 } } for(@{$groups[2]}) { if ($ts == $_) { $i = 2 } } # A basic representation of @sums (@avgs too) # REST|IMC ... represents one of six values (the sums) # condition is the array index for @cats # $sums[GROUP] -> [(REST|IMC)(HRT|SKT|(EMG)?) -> # [CONDITION] for(0..3) { ($sums[$i])->[$_][$j] += $vals[$_+1]; if ($ts >= 21) { $sums[$i]->[4][$j] += $vals[5]; $sums[$i]->[5][$j] += $vals[6]; } } $totals[$i]->[$j]->[0]++; if ($ts >= 21) { $totals[$i]->[$j]->[1]++ } my $col = 0; for (@vals) { $worksheet->write($row, $col, $_); $col++; } $row++; } for my $r (0..2) { for my $c (0..3) { for my $cat (0..$#cats) { $avgs[$r]->[$c][$cat] = $sums[$r]->[$c][$cat]/$totals[$r]->[$cat +]->[0]; if ($r == 0) { $avgs[$r]->[4][$cat] = $sums[$r]->[4][$cat]/$totals[$r]->[$cat +]->[1]; $avgs[$r]->[5][$cat] = $sums[$r]->[5][$cat]/$totals[$r]->[$cat +]->[1]; } } } } my $format = [ [@cats], $avgs[0]->[0], $avgs[0]->[1], $avgs[0]->[2], $avgs[0]->[3], $avgs[0]->[4], $avgs[0]->[5], ]; $worksheet->write_row('J6', $format);

I've worked with data dumper EXTENSIVELY, to a point where I'm pretty sure it has to do with the data STRUCTURE and not the data VALUES.

2002-07-12 Edit by zdog: Added READMORE tag

Replies are listed 'Best First'.
Re: Use of uninitialized value in array element
by Fastolfe (Vicar) on Jul 12, 2002 at 15:56 UTC
    What if one of these conditions fails:
    my ($ts,$cat) = $vals[0] =~ /TS(\d+)(.+)$/; ... for(0..$#cats) { if ($cats[$_] =~ /$cat/) { $j = $_ } } for(@{$groups[0]}) { if ($ts == $_) { $i = 0 } } for(@{$groups[1]}) { if ($ts == $_) { $i = 1 } } for(@{$groups[2]}) { if ($ts == $_) { $i = 2 } }
    It looks like somehow, somewhere, either $i or $j is left undefined. Add some print statements in your loops that print out the values of your various variables and check that your assumptions are valid here. Could there be an empty line in your input?
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Use of uninitialized value in array element
by Fastolfe (Vicar) on Jul 12, 2002 at 16:16 UTC
    You could simplify your code a lot by working with hashes instead of arrays.
    use List::Util 'sum'; while (<TABFILE>) { my @fields = split /\t/; my $identifier = shift @fields; my ($category, $group) = ($identifier =~ /^TS(\d+)(\w+)/); warn "Parse error", return unless $category && $group; # This may not be what you're doing, but change # this to do the calculations you need to do $sums{$group}->{$category} += sum(@fields); $totals{$group}->{$category} += @fields; } # Another way of cleaning up bad data foreach (keys %sums) { $sums{$_}->{CCFN30NT} += delete $sums{$_}->{CCCFN30NT}; $totals{$_}->{CCFN30NT} += delete $totals{$_}->{CCCFN30NT}; ... }
    Also note that Excel can work with CSV or tab-delimited files natively, and more importantly, you can use formulas in Excel to calculate a lot of this. Your script need only output an Excel formula in a field (preceded with an equal sign) instead of calculating the value in advance.
Re: Use of uninitialized value in array element
by broquaint (Abbot) on Jul 12, 2002 at 15:16 UTC
    If you're getting the expected results and know for certain that the values may not be initialized then you can just turn off warnings in that area e.g
    { no warnings 'uninitialized'; for(0..3) { ($sums[$i])->[$_][$j] += $vals[$_+1]; if($ts >= 21) { $sums[$i]->[4][$j] += $vals[5]; $sums[$i]->[5][$j] += $vals[6]; } } $totals[$i]->[$j]->[0]++; }
    I really don't recommend doing this every time you get an uninitialized error, but if you expect there to be uninitialized values and it's a hassle to initialize them (see. chance for more bugs to creep in) then this can be very handy.
    HTH

    _________
    broquaint

      What is it referring to about initialized, though? The @vals array is all there (all initialized). THe @sums array is initialized all to zero in the beginning. I thought maybe it meant the $i, $_, $j indices, but they too all have values.

      (Also, I'm getting incorrect results; this may be a result of another part of the code, though)

      Update: Shoot... didn't hit stop in time. If someone sees this with the right 'power', can they delete the above node (it was "truly a duplicate" with some info left out...)

Re: Use of uninitialized value in array element
by dimmesdale (Friar) on Jul 12, 2002 at 16:35 UTC
    Fastolfe: about hashes, I did that first, and then realized (or thought) that it would be a mess to transfer the hash into an anonymous array as needed by write_row.

    As to your other suggestion, I will try it out. Hmm.. how do you us WriteExcel to give it tabbed data? It comes with a tab2xls.pl file, so I figured you have to use that. I tried just giving it an array, then a filehandle, but that doesn't work. I'll just stick with what I had unless someone knows a better way.

    HOWEVER, I figured out what was wrong. Originally to solve the problem where a couple of conditions had a letter missing/one wrong letter, I used regular expressions in the @cats array. So I had a =~ instead of an eq. However, when I changed this to add the substitution I kept the =~ regex bind in, thinking "nothing will go wrong". WELL, HOW WRONG I WAS! It was matching strings I wanted another category to match (i.e., only part of the string matched, but still true result returned). I put an 'eq' there and it works perfectly (well, there's still some bugs I'm chasing down)

A reply falls below the community's threshold of quality. You may see it by logging in.