Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Print Only header Excel

by TheStig (Novice)
on May 30, 2014 at 19:52 UTC ( [id://1088062]=perlquestion: print w/replies, xml ) Need Help??

TheStig has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I come seeking knowledge for my problem. I am trying to parse data from a .xlsx file and get it into a .csv file. This is normally not a problem, but in this file I am getting they have put a header on it that only will appear in Excel when you print the spreadsheet. It is not a normal header that I can just go in and show. Of course they have put data in this header that we need. The following is my code that I am using to see that I am currently parsing what I need from the excel file. I know it is inefficient, but I just started writing any perl a couple days ago.

#!/usr/bin/perl -w use warnings; use strict; use Spreadsheet::XLSX; use Date::Format; my $fileName = "file path GOES HERE"; my $workbook = Spreadsheet::XLSX->new($fileName) or die "Unable to rea +d the Excel file: $@$!"; for my $worksheet($workbook->worksheet("AMD 87 MAY 14")){ my($firstRow, $lastRow) = $worksheet->row_range(); my($firstCol, $lastCol) = $worksheet->col_range(); for my $row($firstRow .. $lastRow){ for my $col($firstCol .. $lastCol){ my $cell = $worksheet->get_cell($row, $col); next unless $cell; print "Row, Col = ($row, $col)\n"; print "Value = ", $cell->value(). "\n"; } } }

I am unsure if it is even possible to parse that print only header in the file, but I was guessing if anyone knew if it could happen it would be you guys.

Replies are listed 'Best First'.
Re: Print Only header Excel
by Tux (Canon) on May 31, 2014 at 08:19 UTC
Re: Print Only header Excel
by Corion (Patriarch) on May 30, 2014 at 21:01 UTC

    XLSX files are just ZIP files with XML in them. If Spreadsheet::XLSX does not give you access to the print headers, you can always unzip the archive and read the XML yourself to find the relevant data.

Re: Print Only header Excel
by GotToBTru (Prior) on May 30, 2014 at 20:01 UTC

    Use Excel to print the spreadsheet to a file, and work from that instead of the spreadsheet itself? Or, assuming the spreadsheet gets this special data displayed in the header from within itself, you just need to know where to look to get it. Examining the header/footer definitions should reveal this.

    1 Peter 4:10
Re: Print Only header Excel
by InfiniteSilence (Curate) on May 30, 2014 at 20:59 UTC

    Are you the real Stig?

    Celebrate Intellectual Diversity

Re: Print Only header Excel
by TheStig (Novice) on Jun 02, 2014 at 15:17 UTC

    Thank you for the help. I am able to read the file much faster using ParseXLSX. I unzipped my Excel file and was able to find my header in the XML for it.

    Now I am trying to write a script that will unzip the Excel file for me so then I can access the oddHeader tag and get my needed data. I am currently trying to use Archive::Zip to help me, but I am getting errors and I have been unable to unzip the file so far using my script.

    Am I once again using an out of date library and I should use something else, or is there a boilerplate piece of code that most people just always use to do this that I can put in my script?

      Perhaps you can adapt this
      #!perl use strict; use Archive::Zip::MemberRead; use XML::Twig; use Text::CSV; Archive::Zip::MemberRead->setLineEnd(); my $csv = Text::CSV->new ( {binary=>1, eol=>"\012"} ) or die "Cannot use CSV: ".Text::CSV->error_diag(); my $infile = 'header1.xlsx'; my $outfile = 'results.csv'; open my $fh_out,'>',$outfile or die "Could not open $outfile: $!"; my @sheet; my $zip = Archive::Zip->new($infile); for my $name ($zip->memberNames()){ if ($name =~ /sheet(\d+).xml$/){ $sheet[$1] = $name; } } my @fields = ('oddHeader','evenHeader','oddFooter','evenFooter'); $csv->print($fh_out, [$infile,scalar localtime()]); $csv->print($fh_out, ['Sheet',@fields]); my %text=(); for my $no (1..$#sheet) { %text=(); process_sheet( $sheet[$no] ); my @row = map{ $text{$_} } @fields; $csv->print($fh_out, [$no,@row]); } $fh_out->close or die "$!"; print "Results written to $outfile\n"; sub process_sheet { my $filename = shift; print "Extracting text from : $filename\n"; my $fh_in = Archive::Zip::MemberRead->new($zip, $filename); my $xml = $fh_in->getline(); $fh_in->close(); my $twig = XML::Twig->new( keep_spaces=>1, twig_roots => { 'headerFooter' => \&get_text }, ); $twig->parse( $xml ); } sub get_text { my ($t,$elt) = @_; for ($elt->children){ my $text = $_->text; # remove left center right format codes $text =~ s/&[LCR]/ /g; $text{$_->name} = $text ; } }
      poj

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1088062]
Approved by taint
Front-paged by GotToBTru
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2024-04-19 19:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found