Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Spreadsheet::WriteExcel::Big is Bigger

by shockme (Chaplain)
on Jan 23, 2002 at 20:42 UTC ( [id://140928]=perlquestion: print w/replies, xml ) Need Help??

shockme has asked for the wisdom of the Perl Monks concerning the following question:

I have a comma delimited file which upper management imports into an Excel file. I figured I'd show some initiative and write a script to create the Excel file for them.

I'd never used Spreadsheet::WriteExcel before, but it was a breeze. The spreadsheet is exactly the way management uses it. The only difference is the size of the file. The file created by my perl script is much larger than the file created by importing the comma delimited file. Here's the breakdown:

  • WriteExcel: 7.8MB
  • Comma delimited: 6.3MB

Following is the script I'm using to create the file. I'd like to hear any suggestions regarding reducing the size of the Excel file. Like I said, I've never played with the module before, so I may be missing some obvious efficiency issues.

#!/usr/bin/perl -w use strict; use Spreadsheet::WriteExcel::Big; open (IN, "temp.txt") || die "Cannot open temp.txt\n"; # set up Excel spreadsheet my $workbook = Spreadsheet::WriteExcel::Big->new("temp.xls"); my $worksheet = $workbook->addworksheet(); my $inLine; my $row = 0; while ($inLine = <IN>) { my ($date, $log, $protocol, $firstIP, $secondIP, $packets) = split(/,/, $inLine); # strip port number if it exists if ($firstIP =~ /([\d.]+)\((\d+)\)/) { my $ip1 = $1; my $port1 = $2; } else { my $ip1 = $firstIP; my $port1 = " "; } if ($secondIP =~ /([\d.]+)\((\d+)\)/) { my $ip2 = $1; my $port2 = $2; } else { my $ip2 = $secondIP; my $port2 = " "; } push my @XLSarray, ($date, $log, $protocol, $ip1, $port1, $ip2, $port2, $pac +kets); # print information to Excel spreadsheet my $col = 0; foreach my $XLSelement (@XLSarray) { $worksheet->write($row, $col, $XLSelement); $col++; } $row++; } $workbook->close();

If things get any worse, I'll have to ask you to stop helping me.

Replies are listed 'Best First'.
Re: Spreadsheet::WriteExcel::Big is Bigger
by jmcnamara (Monsignor) on Jan 23, 2002 at 21:18 UTC

    Excel uses several optimisations to reduce the size of it's files. One is to store 64bit floats in 32bits if possible.* Spreadsheet::WriteExcel trades this space for speed (and simplicity) and writes all numbers as 64 bit floats.

    Excel also stores repeated strings in a hash table. This can give a big space saving for certain types of data. Spreadsheet::WriteExcel will do this in a later version but probably only for long strings.

    There are a few other optimisation as well. When you save the file, Excel will perform the optimisations and the file size will reduce.

    Note, you can write an array of data in one go by passing it as an arrayref:

    # print information to Excel spreadsheet $worksheet->write($row, 0, \@XLSarray); $row++;

    --
    John.

    * For the anoraks. Actually, where possible, it stores ints and floats in 30 bits and uses the other two bits to indicate the type of stored variable.

      Thanks John. Opening it with Excel and saving it did reduce the size of the file.
      • WriteExcel: 6.85MB
      • Comma delimited: 6.24MB
      Cool stuff! Thanks again.

      If things get any worse, I'll have to ask you to stop helping me.

Re: Spreadsheet::WriteExcel::Big is Bigger
by talexb (Chancellor) on Jan 23, 2002 at 21:32 UTC
    I find that the portion of your code like
    my $ip1 = $1; my $port1 = $2;
    is suspect. As soon as this scope ends, these two variables (and their values, of course) disappear, yet you are using them in the push statement. I see you are using use strict; -- doesn't Perl complain about these undefined values?

    I would declare those two variables up higher, near the split statement -- then they stay in scope long enough to be used by the push statement.

    Sure hope I'm not missing anything obvious here. :(

    --t. alex

    "Of course, you realize that this means war." -- Bugs Bunny.

      Nope, you're not missing a thing. I uploaded the wrong version of my script. Here's the portion you're questioning from the actual script:
      while ($inLine = <IN>) { my ($ip1, $port1, $ip2, $port2) = 0; my ($date, $log, $protocol, $firstIP, $secondIP, $packets) = split(/,/, $inLine); # strip port number if it exists if ($firstIP =~ /([\d.]+)\((\d+)\)/) { $ip1 = $1; $port1 = $2; } else { $ip1 = $firstIP; $port1 = " "; } if ($secondIP =~ /([\d.]+)\((\d+)\)/) { $ip2 = $1; $port2 = $2; } else { $ip2 = $secondIP; $port2 = " "; }

      After a quick review, that seems to be the only difference. Sorry for the confusion.

      If things get any worse, I'll have to ask you to stop helping me.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://140928]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2024-04-19 20:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found