Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Read Excel cell comments?

by AlwaysLearning (Sexton)
on Apr 02, 2007 at 07:42 UTC ( [id://607772]=perlquestion: print w/replies, xml ) Need Help??

AlwaysLearning has asked for the wisdom of the Perl Monks concerning the following question:

I have an extremely slow script that successfully reads cell values and comments, using OLE via Excel.Application...

I've heard about Spreadsheet::ParseExcel, and wondered if it would be faster, but don't see a way to read the cell comments.... I Data::Dumper'ed the results of Parse, and alas! the comments were not found anywhere!

I've read the docs, but maybe I missed something? WriteExcel allows writing comments... is ReadExcel more limited?

Couldn't find another module that reads XLS files...

Any ideas on where to go next?

The solution

Convert all the data using OpenOffice, and an OpenOffice Calc Macro, which converts the Excel comments to columns, and emits a UTF-8 TSV file. UTF-8 TSV files are a great format, as long as there are no tabs in the data, and Excel makes it reasonably hard to put tabs in the data... OpenOffice can convert to TSV in about 2 minutes, and Perl can read the TSV in about 2 seconds. So for the overall process, from 20 minutes down to 2 minutes!

Replies are listed 'Best First'.
Re: Read Excel cell comments?
by misterwhipple (Monk) on Apr 02, 2007 at 14:56 UTC
    You could try opening the spreadsheet with Excel, and saving it as an XML spreadsheet. The XML should be much easier to parse.

    I just did a test using Excel 2003, and the cell comments are preserved. Here is the relevant portion of the XML. (I've changed the whitespace and removed font tags for clarity)

    <Row> <Cell> <Data ss:Type="String">This is a cell</Data> <Comment ss:Author="Throckmorton P. Ruddygore"> <ss:Data xmlns="http://www.w3.org/TR/REC-html40"> <B>Throckmorton P. Ruddygore:</B>&#10; This is a comment </ss:Data> </Comment> </Cell> </Row>

    If you have an inconveniently large number of spreadsheets to work with, perhaps you could use OLE to automate the conversion to XML.

    cat >~/.sig </dev/interesting
      While being bound to Excel 11 (aka 2003) 10 (aka 2002)*) and above might be bearable — depending on your environment and what you need it for —, the more severe problem I found with this approach is the internal optimization Excel performs with empty cells.

      Excel will drop <Cell> tags if no "significant" data or format information has to be saved there. Instead the next cell in the output will get an attribute to indicate its column position in the worksheet instead.

      Suppose you have a row which looks similar to
      data     more data

      Then your XMLSS output might look somewhat like:

      <Row> <Cell> <Data ss:Type="String">data</Data> </Cell> <Cell ss:Index="3"> <Data ss:Type="String">more data</Data> </Cell> ...

      which makes it harder than necessary for my taste to navigate to the cells in search for.

      Update: *) I now think XML-Spreadsheet (XMLSS) export was introduced with Excel 10 ->MSDN

        The spreadsheets are coming from someone else... I only have Excel 9/2000 and it seems XML is not one of the output formats... not too interested in giving MS more money for something I don't want to use anyway... Thanks for the warning about the empty cell optimization... I'm sure it is helpful for large, sparse spreadsheets, but at least it looks way better than the contorted HTML Excel 9 produces... although most of the bizarrest stuff is header boiler plate, and much of it can be ignored for my application... but the HTML version also doesn't export comments....
Re: Read Excel cell comments?
by veeruch (Sexton) on Apr 02, 2007 at 12:59 UTC
    you can create,read,insert,update in to excel.
    for update into Excel:
    use Spreadsheet::ParseExcel::SaveParser;
    Below code is reading Excel data from CELL
    use Spreadsheet::ParseExcel;<br> my $oBook = $oExcel->Parse("test.xls"); $oWkS = $oBook->{Worksheet}[0]; for(my $iR = 0;defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR +++) { $oWkC0 = $oWkS->{Cells}[$iR][0]; $oWkC1 = $oWkS->{Cells}[$iR][1]; if ($oWkC0->{Val}){ print "$oWkC0->{Val}-------$oWkC1\n"; } }
    regards
    veeru
    "The fragrance of flowers spreads only in the direction of the wind. But the goodness of a person spreads in all direction"
Re: Read Excel cell comments?
by Ionitor (Scribe) on Apr 02, 2007 at 13:04 UTC

    Depending on how much time and energy you want to put into this, one option might be to write a program to extract the comments yourself from a version of the Excel file saved as HTML. This is probably easier than attempting to extract the comments from the .xls file.

    Otherwise, I'd look for ways to speed up the OLE version. Maybe you can use ParseExcel to get cell values and use OLE to extract the comments?

      Yeah, the hybrid approach might be workable... too bad ParseExcel doesn't seem to read cell comments... I'll have to see how much time it saves vs using OLE for everything.
        OK, the hybrid approach saves 1/2 the time.

        Sure wish ParseExcel would read comments, that would save the 90% of the rest of the time, I suspect...

Re: Read Excel cell comments?
by toma (Vicar) on Apr 03, 2007 at 06:56 UTC
    If you have and old Excel and you want to make XML, you can use OpenOffice instead. Download OpenOffice for free.

    See my page (external link) Perl and OpenOffice. This page links to a tutorial and various example programs. I just checked, and the OpenOffice format does include the comments in the XML. Here is the synopsis: save the Excel file as a .ods file, unzip it, and parse content.xml.

    It should work perfectly the first time! - toma
      Indeed!

      I had recently installed OpenOffice figuring to try it out sometime, so this became the time...

      Took two minutes to convert from .xls to .xml (spreadsheet is about 4MB), plus it is a manual step, so that would mean that of the 9 minutes of remaining time, reading .xml could save at most 7, because of the need to convert .xls to .xml first.

      Is there anything around that is similar to Spreadsheet::ParseExcel, for the .xml files, or would this be a brand new project starting from scratch (and the XML parser)?

      Might it not be easier to add comment parsing to Spreadsheet::ParseExcel? I haven't looked at its internals, and I currently know nothing of the Excel file format except that it seems to be documented somewhere, using a format called BIFF (which conjures up potential relationships of something called TIFF, but that may be coincidental)... so that might be like a brand new project starting from scratch too...

        You can convince OpenOffice to read the file and save the XML from the command line. You will have to write a few lines of code in the OpenOffice scripting language to do this. An example is in the paper on the web page that I referenced above 'Using Perl to Read and Write OpenOffice Documents.' See section #2 of the paper, 'Generating Web Banners' for example code and references for automating OpenOffice tasks.

        For XML parsing I used XML::Twig. There are examples of how to do this in the paper. It is not hard, but XML is a large topic. You could easily use one of the many simpler XML modules for this task.

        What you want to do is easy with this approach. For example, I wrote a program to extract the speaker notes from a PowerPoint presentation. See Section #1 of the paper for details. It took 20 minutes to implement the whole thing. Learning how to use this approach took much longer, but my paper should have enough to get you started.

        It should work perfectly the first time! - toma
Re: Read Excel cell comments?
by zentara (Archbishop) on Apr 02, 2007 at 18:40 UTC
      Hmm. Can't figure out how to read comments using this tool...
        Doh! My bad. I don't use Windows and it shows. :-)

        I'm not really a human, but I play one on earth. Cogito ergo sum a bum

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://607772]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (3)
As of 2024-04-19 22:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found