Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

rowspan vals in Excel SS are not interpreted by ParseExcel::Simple

by misterperl (Pilgrim)
on Feb 08, 2017 at 19:52 UTC ( [id://1181449]=perlquestion: print w/replies, xml ) Need Help??

misterperl has asked for the wisdom of the Perl Monks concerning the following question:

I searched "rowspan" for previous messages about this topic, but results were all related to HTML tables. I also searched the CPAN methods for this module, and saw none seemingly related to the problem. Same with "known issues".. I'm using Spreadsheet::ParseExcel::Simple to read a spreadsheet. I'm having trouble with cells than span multiple rows such as:
Animal Name ------------------ | Fluffy |-------- Cat | Vera |------- | Manxie ---------|-------
I get back:
row 1 : '', 'Fluffy' row 2 : 'Cat', 'Vera' row 3 : '', 'Manxie'
even though a human can tell that 'Cat' belongs in all 3 rows. Is there a setting, or another parser that I can use to deal with this? The problem is there is no indicator I can interpret to tell what rows a given value spans. Thank-You

Replies are listed 'Best First'.
Re: rowspan vals in Excel SS are not interpreted by ParseExcel::Simple
by poj (Abbot) on Feb 08, 2017 at 20:51 UTC

    Spreadsheet::ParseExcel::Worksheet has a get_merged_areas() method

    #!perl use strict; use Spreadsheet::ParseExcel; my $parser = Spreadsheet::ParseExcel->new(); my $workbook = $parser->parse('merge.xls'); for my $worksheet ( $workbook->worksheets() ) { my $range = $worksheet->get_merged_areas(); for (@$range){ print " start row : $_->[0] start col : $_->[1] end row : $_->[2] end col : $_->[3] \n\n"; } }
    poj
      Thanks guys- I'll take a look at worksheets and merge/center.. It may solve the problem... Voted you all ++ I greatly appreciate the time you took to reply !
Re: rowspan vals in Excel SS are not interpreted by ParseExcel::Simple
by FreeBeerReekingMonk (Deacon) on Feb 08, 2017 at 21:48 UTC
    Simple does not do spanning... Now the Spreadsheet::ParseExcel data is still accessible in Spreadsheet::ParseExcel::Simple

    use Spreadsheet::ParseExcel::Simple; use Data::Dumper; my $xls = Spreadsheet::ParseExcel::Simple->read('excel.xls'); foreach my $sheet ($xls->sheets) { die Dumper $sheet; #while ($sheet->has_data) { # my @data = $sheet->next_row; #} }

    Take a look at the fields AlignH, AlignV and Merged. If Merged=1 you can have a value or no value.

    So, in order to keep on using Spreadsheet::ParseExcel::Simple I added a small unmerge() function that removes the span's:

    use Spreadsheet::ParseExcel::Simple; my $xls = Spreadsheet::ParseExcel::Simple->read('excel.xls'); foreach my $sheet ($xls->sheets) { #unmerge unmerge($sheet->{sheet}); while ($sheet->has_data) { my @data = $sheet->next_row; print join("\t", @data),"\n"; } } sub unmerge{ my($oWkS) =@_; print "--------- SHEET:", $oWkS->{Name}, "\n"; for(my $iR = $oWkS->{MinRow} ; defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++) + { for(my $iC = $oWkS->{MinCol} ; defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxC +ol} ; $iC++) { $oWkC = $oWkS->{Cells}[$iR][$iC]; next unless $oWkC; if($oWkC->is_merged){ if($oWkC->Value){ print "COPY V=$oWkC->{Format}->{AlignV} H=$oWkC->{ +Format}->{AlignH}\n"; for my $i (1..$oWkC->{Format}->{AlignV}){ print "ADDV [$iR+$i][$iC] \n"; $oWkS->{Cells}[$iR+$i][$iC]->{_Value} = $oWkC- +>Value; $oWkS->{Cells}[$iR+$i][$iC]->{Merged}=0; } for my $i (1..$oWkC->{Format}->{AlignH}){ print "ADDH [$iR][$iC+$i] \n"; $oWkS->{Cells}[$iR][$iC+$i]->{_Value} = $oWkC- +>Value; $oWkS->{Cells}[$iR][$iC+$i]->{Merged}=0; } } } print "( $iR , $iC ) ==", $oWkC->Value, " ", $oWkC->{Merge +d},"\n"; } } }

    This should copy/clone your spanned data... while still being able to use Simple...

    in my case, my output was:

    --------- SHEET:Sheet1 COPY V=2 H=0 ADDV [0+1][0] ADDV [0+2][0] ( 0 , 0 ) ==cat 1 ( 0 , 1 ) ==Fluffy ( 1 , 0 ) ==cat 0 ( 1 , 1 ) ==Vera ( 2 , 0 ) ==cat 0 ( 2 , 1 ) ==Manxie cat Fluffy cat Vera cat Manxie

    edit: can you send me your xls example where the cell is at the second row, Libreoffice forces the data from the rightmost/topmost...

    edit2: Ok, merged cells that already had a value, and my subroutine dies because the AlignH/AlignV are messed up. This will need some more tinkering before it works properly. (but need to sleep now)

      Used poj's knowhow to make it work. (what was I thinking yesterday evening, AlignH is for indentation, not merged cells). Code is smaller to boot!

      use Spreadsheet::ParseExcel::Simple; my $xls = Spreadsheet::ParseExcel::Simple->read('excel.xls'); foreach my $sheet ($xls->sheets) { #unmerge unmerge($sheet->{sheet}); while ($sheet->has_data) { my @data = $sheet->next_row; print join("\t", @data),"\n"; } } sub unmerge{ my($oWkS) =@_; print "--------- SHEETNAME:", $oWkS->{Name}, "\n"; my $range = $oWkS->get_merged_areas(); for (@$range){ print " start row : $_->[0] start col : $_->[1] end row : $_->[2] end col : $_->[3] \n\n"; $oWkC = $oWkS->{Cells}[$_->[0]][$_->[1]]; for my $iR ($_->[0] .. $_->[2]){ for my $iC ($_->[1] .. $_->[3]){ next if ($iR eq $_->[0] && $iC eq $_->[1]); print "SET [$iR][$iC] <= [$_->[0]][$_->[1]]\n"; $oWkS->{Cells}[$iR][$iC]->{_Value} = $oWkC->Value; } } } print "--------- DONE Unmerging.\n"; }

      Output:

      --------- SHEETNAME:Sheet1 start row : 0 start col : 0 end row : 2 end col : 0 SET [1][0] <= [0][0] SET [2][0] <= [0][0] --------- DONE Unmerging. cat Fluffy cat Vera cat Manxie

      Caveat: The {_Value} is still not guaranteed to be in the topleft cell from a range (as misterperl had it in the second column), so the code might need to be expanded to first SEARCH for a defined {_Value} in the range. Then have that as the source data cell, then iterate over the range writing all other cells that are undefined...

      You can access the functions from within Simple like:

      $sheet->{sheet}->get_merged_areas();

      Thank You, great example! From the hints here I wrote this short solution (using the non-simple ParseExcel) :
      # fill in merged cells my $ma = $page->{sheet}->{MergedArea}; for my $a ( @$ma ) { my ( $tr, $lc, $br, $rc ) = @$a; # propogate values top left to bottom right my $p = $page->{sheet}->{Cells}->[$tr]->[$lc]->{Val}; for ( my $row=$tr; $row<=$br; $row++ ) { for ( my $col=$lc; $col<=$rc; $col++ ) { $page->{sheet}->{Cells}->[$row]->[$col]->{Val}=$p; } } } # for $a
      Seems to do the trick! I've seen some cases where the cell hash also has another Val which is something like _Val. I was never sure why those were there but I've seen examples where it contains the actual val. So in that case, $p may have to be assigned differently... Thanks all. I was floundering until you pointed out all the "merged" stuff. Seems like an odd term- I would have thought they'd use "span" not "merge"..
        Ah, when I was reading the code, I found this explanation about Val and _Value in ~/perl5/lib/perl5/Spreadsheet/ParseExcel/Cell.pm

        ############################################################ # # value() # # Returns the formatted value of the cell. # sub value { my $self = shift; return $self->{_Value}; } ############################################################ # # unformatted() # # Returns the unformatted value of the cell. # sub unformatted { my $self = shift; return $self->{Val}; } ############################################################

        You see, sometimes a cell has a formula, and it holds the formula string, but also a resulting value. So Val holds =sum(A1+1) and _Value holds "42". It was/is? a problem in a XLS writer module, where only one was filled in, so you were required to open the spreadsheet in excel, let it calculate the values, then save it again in order to keep on processing the sheet.

      ..oddly enough, after pushing the value into the {Val} and {_Value} fields, I figured I was good to go. But I then used:
      my @row=$page->next_row;
      and all of the vals I propagated in are missing.

      Yet if I inspect the cell Val at the row and col, they are all correct. So I had to manually push values into the array:

      for ( my $c=0; $c<=$maxCol; $c++ ) { next if $row[$c] =~ /./; $row[$c]=$page->{sheet}->[$row]->[$c]->{Val}; }
      Sure didn't expect that! * * * Followup- not sure why I was seeing that- but now everything is behaving without that step.. YAY thanks guys you were all a great help.
        See also previous explanation:

        The perl module uses the function Value() to retrieve the value from the variable {_Value}. So setting {Val} will not work with next_row().

        sub next_row { map { $_ ? $_->Value : "" } @{$_[0]->{sheet}->{Cells} }

        Now, because a spanned cell COULD hold a value() BEFORE spanning (usually empty, but not necessarily), you will get that value() instead of the spanned value...

        spanning is like mounting a drive over a directory that may already hold files.... but it works now, congrats!

Re: rowspan vals in Excel SS are not interpreted by ParseExcel::Simple
by Anonymous Monk on Feb 08, 2017 at 20:24 UTC
    I used Spreaadsheet::Read (with Spreadsheet::ParseXLSX):
    use Data::Dumper; use Spreadsheet::Read; my $book = ReadData ("foo.xlsx"); print Dumper [ Spreadsheet::Read::rows($book->[1]) ];
    I used "Merge and Center" so that my 2 left most columns (cat and dog) each had 3 cells to the right and the output was:
    $VAR1 = [ [ 'cat', 'foo' ], [ '', 'bar' ], [ '', 'baz' ], [ 'dog', 'one' ], [ '', 'two' ], [ '', 'tree' ] ];
    This seems more ideal, but you'll still need to determine where the cut off is. You might be a victim of "garbage in/garbage out" as well. :(

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1181449]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2024-04-25 08:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found