in reply to Character coding issues with Spreadsheet::XLSX

If M$ somehow decided that Excel 2007 would not change the way unicode is handled in spreadsheets, then this might help you out: xls2tsv uses the old Spreadsheet::ParseExcel, but if the unicode handling hasn't changed, then you'll find a consistent clue about when you need to "decode()" from UTF-16BE into utf8 to get what you want.

Then again, if M$ did decide to change their unicode handling in Excel, you might need to get some sort of hex-dump picture of the character data in the cells of interest. Save a spreadsheet with known non-ascii characters in selected cells, and you should be able to work out what needs to be done.

  • Comment on Re: Character coding issues with Spreadsheet::XLSX

Replies are listed 'Best First'.
Re^2: Character coding issues with Spreadsheet::XLSX
by suaveant (Parson) on Feb 20, 2009 at 21:00 UTC
    Well... that wasn't exactly it but was close enough to get me there... decode('utf8',$val) did it for me, thanks!

                    - Ant
                    - Some of my best work - (1 2 3)

      Where did you get the decode() routine from? What perl module did you use?? We are having the same problems but your answer was incomplete. Thanks for your help in advance, Jodyman