Re: Read Excel cell comments?
by misterwhipple (Monk) on Apr 02, 2007 at 14:56 UTC
|
You could try opening the spreadsheet with Excel, and saving it as an XML spreadsheet. The XML should be much easier to parse.
I just did a test using Excel 2003, and the cell comments are preserved. Here is the relevant portion of the XML. (I've changed the whitespace and removed font tags for clarity)
<Row>
<Cell>
<Data ss:Type="String">This is a cell</Data>
<Comment ss:Author="Throckmorton P. Ruddygore">
<ss:Data xmlns="http://www.w3.org/TR/REC-html40">
<B>Throckmorton P. Ruddygore:</B>
This is a comment
</ss:Data>
</Comment>
</Cell>
</Row>
If you have an inconveniently large number of spreadsheets to work with, perhaps you could use OLE to automate the conversion to XML.
cat >~/.sig </dev/interesting
| [reply] [d/l] |
|
While being bound to Excel 11 (aka 2003) 10 (aka 2002)*) and above might be bearable — depending on your environment and what you need it for —, the more severe problem I found with this approach is the internal optimization Excel performs with empty cells.
Excel will drop <Cell> tags if no "significant" data or format information has to be saved there. Instead the next cell in the output will get an attribute to indicate its column position in the worksheet instead.
Suppose you have a row which looks similar to
Then your XMLSS output might look somewhat like:
<Row>
<Cell>
<Data ss:Type="String">data</Data>
</Cell>
<Cell ss:Index="3">
<Data ss:Type="String">more data</Data>
</Cell>
...
which makes it harder than necessary for my taste to navigate to the cells in search for.
Update: *) I now think XML-Spreadsheet (XMLSS) export was introduced with Excel 10 ->MSDN | [reply] [d/l] [select] |
|
The spreadsheets are coming from someone else... I only have Excel 9/2000 and it seems XML is not one of the output formats... not too interested in giving MS more money for something I don't want to use anyway...
Thanks for the warning about the empty cell optimization... I'm sure it is helpful for large, sparse spreadsheets, but at least it looks way better than the contorted HTML Excel 9 produces... although most of the bizarrest stuff is header boiler plate, and much of it can be ignored for my application... but the HTML version also doesn't export comments....
| [reply] |
Re: Read Excel cell comments?
by veeruch (Sexton) on Apr 02, 2007 at 12:59 UTC
|
you can create,read,insert,update in to excel.
for update into Excel:
use Spreadsheet::ParseExcel::SaveParser;
Below code is reading Excel data from CELL
use Spreadsheet::ParseExcel;<br>
my $oBook = $oExcel->Parse("test.xls");
$oWkS = $oBook->{Worksheet}[0];
for(my $iR = 0;defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR
+++) {
$oWkC0 = $oWkS->{Cells}[$iR][0];
$oWkC1 = $oWkS->{Cells}[$iR][1];
if ($oWkC0->{Val}){
print "$oWkC0->{Val}-------$oWkC1\n";
}
}
regards
veeru
"The fragrance of flowers spreads only in the direction of the wind. But the goodness of a person spreads in all direction" | [reply] [d/l] |
Re: Read Excel cell comments?
by Ionitor (Scribe) on Apr 02, 2007 at 13:04 UTC
|
Depending on how much time and energy you want to put into this, one option might be to write a program to extract the comments yourself from a version of the Excel file saved as HTML. This is probably easier than attempting to extract the comments from the .xls file.
Otherwise, I'd look for ways to speed up the OLE version. Maybe you can use ParseExcel to get cell values and use OLE to extract the comments?
| [reply] |
|
Yeah, the hybrid approach might be workable... too bad ParseExcel doesn't seem to read cell comments... I'll have to see how much time it saves vs using OLE for everything.
| [reply] |
|
| [reply] |
Re: Read Excel cell comments?
by toma (Vicar) on Apr 03, 2007 at 06:56 UTC
|
If you have and old Excel and you want to make XML, you can use OpenOffice instead. Download OpenOffice for free.
See my page (external link) Perl and OpenOffice. This page links to a tutorial and various example programs.
I just checked, and the OpenOffice format does include the comments in the XML. Here is the synopsis: save the Excel file as a .ods file, unzip it, and parse content.xml.
It should work perfectly the first time! - toma
| [reply] |
|
Indeed!
I had recently installed OpenOffice figuring to try it out sometime, so this became the time...
Took two minutes to convert from .xls to .xml (spreadsheet is about 4MB), plus it is a manual step, so that would mean that of the 9 minutes of remaining time, reading .xml could save at most 7, because of the need to convert .xls to .xml first.
Is there anything around that is similar to Spreadsheet::ParseExcel, for the .xml files, or would this be a brand new project starting from scratch (and the XML parser)?
Might it not be easier to add comment parsing to Spreadsheet::ParseExcel? I haven't looked at its internals, and I currently know nothing of the Excel file format except that it seems to be documented somewhere, using a format called BIFF (which conjures up potential relationships of something called TIFF, but that may be coincidental)... so that might be like a brand new project starting from scratch too...
| [reply] |
|
You can convince OpenOffice to read the file and save the XML from the command line. You will have to write a few lines of code in the OpenOffice scripting language to do this. An example is in the paper on the web page that I referenced above 'Using Perl to Read and Write OpenOffice Documents.' See section #2 of the paper, 'Generating Web Banners' for example code and references for automating OpenOffice tasks.
For XML parsing I used XML::Twig. There are examples of how to do this in the paper. It is not hard, but XML is a large topic. You could easily use one of the many simpler XML modules for this task.
What you want to do is easy with this approach. For example, I wrote a program to extract the speaker notes from a PowerPoint presentation. See Section #1 of the paper for details. It took 20 minutes to implement the whole thing. Learning how to use this approach took much longer, but my paper should have enough to get you started.
It should work perfectly the first time! - toma
| [reply] |
Re: Read Excel cell comments?
by zentara (Archbishop) on Apr 02, 2007 at 18:40 UTC
|
| [reply] |
|
Hmm. Can't figure out how to read comments using this tool...
| [reply] |
|
Doh! My bad. I don't use Windows and it shows. :-)
| [reply] |