Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Excel2Text

by stonecolddevin (Parson)
on Dec 18, 2006 at 21:58 UTC ( [id://590565]=sourcecode: print w/replies, xml ) Need Help??
Category: Utility Scripts
Author/Contact Info Devin Austin devin.austin@timorperfectus.com
Description: Simple conversion script for Excel to a pipe ("|") delimited text file. Takes two arguments, the filename (can be relative) and the directory to save the text file to.

#!perl -w
use warnings;
use strict;
## Author: Devin Austin
## Excel to Text script
## very simple script...
## enter file name and get going!
## Usage: convert.pl <excel file> <directory to save text file to>
## Thanks to GrandFather for pretty much re-factoring my original code
## into this version.
## mostly straight out of the docs...

use Spreadsheet::ParseExcel;
my $oExcel = new Spreadsheet::ParseExcel;
my $dir = $ARGV[1];
chomp $dir;

unless (-d $dir) {
   print "Can't find directory $dir";
   exit;
}

my $filename = "$dir/$ARGV[0]";
chomp $filename;

unless (-e $filename) {
   print "Can't find file $filename";
   exit ;
}

my $fullfilename = "$filename.txt";
## start work
print "Converting...";
#1.1 Normal Excel97
open E2T, ">", $fullfilename or die $!;

my $oBook = Spreadsheet::ParseExcel::Workbook->Parse($filename);
my($iR, $iC, $oWkS, $oWkC);

foreach my $oWkS (@{$oBook->{Worksheet}}) {
   
    print "--------- SHEET:", $oWkS->{Name}, "\n";
    print E2T $oWkS->{Name}, "|";
    next unless defined $oWkS->{MinRow} and defined $oWkS->{MaxRow};
   
    for my $iR ($oWkS->{MinRow} .. $oWkS->{MaxRow}) {
      for my $iC ($oWkS->{MinCol} .. $oWkS->{MaxCol}) {
          $oWkC = $oWkS->{Cells}[$iR][$iC];
          next if ! defined $oWkC;
         
          print "( $iR , $iC ) =>", $oWkC->Value, "\n" if($oWkC);
          print E2T $oWkC->Value, "|";
      }
   }
   
   print E2T "\n";
}
close E2T;

print "Finished!";
exit;
Replies are listed 'Best First'.
Re: Excel2Text
by mikeB (Friar) on Dec 19, 2006 at 15:59 UTC
    In my experience, no matter what you choose for a delimiter, that character will eventually show up in the input, leading to problems parsing the generated output. You might want to escape any delimiter characters before writing the output. Alternatively, if you can get along with CSV files instead of '|' delimited files, check out Class::CSV.

      I'm not exactly certain what you mean by this reply. The "|" character is split on and only the values in between the "|" characters are used. Maybe I'm not understanding your reply, but this script's goal is to take all the fields from an Excel file and write them to a text file, separating each field by the "|" character and each row by a newline ("\n"). UPDATE: Misunderstood mikeB's comment. Yes, things can get ugly if there's a "|" in the Excel sheet like GrandFather said. However, it's just one of many options, delimit it with what you like, or you can use Class::CSV.

      meh.

        If a cell contains text with a | in it then things go a little pear shaped ;).


        DWIM is Perl's answer to Gödel
Re: Excel2Text
by Tux (Canon) on Jan 02, 2007 at 14:55 UTC

    That and much more is already supported in Spreadsheet::Read's xlscat utility. See -s option to alter the default | to whatever you like, and -c and -m for CSV output:

    usage: xlscat [-s <sep>] [-L] [-u] [ Selection ] file.xls [-c | -m] [-u] [ Selection ] file.xls -i [ -S sheets ] file.xls Generic options: -v[#] Set verbose level (xlscat) -d[#] Set debug level (Spreadsheet::Read) -u Use unformatted values --noclip Do not strip empty sheets and trailing empty rows and columns Input CSV: --in-sep=c Set input sep_char for CSV Output Text (default): -s <sep> Use separator <sep>. Default '|', \n allowed -L Line up the columns Output Index only: -i Show sheet names and size only Output CSV: -c Output CSV, separator = ',' -m Output CSV, separator = ';' Selection: -S <sheets> Only print sheets <sheets>. 'all' is a valid set Default only prints the first sheet -R <rows> Only print rows <rows>. Default is 'all' -C <cols> Only print columns <cols>. Default is 'all' -F <flds> Only fields <flds> e.g. -FA3,B16

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: sourcecode [id://590565]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2024-04-19 23:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found