http://qs321.pair.com?node_id=797289


in reply to Writing International Phonetic Alphabet symbols to Excel?

You're probably going to have to encode the symbols in Microsoft's own character set. I've done a lot of coding the other way round - extracting symbols from Excel and trying to put them in a Postgres database with UTF8 encoding. You need to find the matching symbol in, say, "Character Map", find the ord() of that, and change the value of the input character to that ord value.

rdfield

Replies are listed 'Best First'.
Re^2: Writing International Phonetic Alphabet symbols to Excel?
by Porculus (Hermit) on Sep 24, 2009 at 20:46 UTC

    There is no "Microsoft's own character set" to worry about. Excel uses totally standard Unicode internally.

    I've done quite a lot recently writing Unicode text into Excel files with Perl on Solaris, and I've never had to worry about encodings at all.

      Looks like it was just the CSV files generated from Excel - working with the spreadsheet directly sounds like a better option.

      rdfield

      After doing some testing on this, it looks like MS uses latin1 (or something very close) internally.

      I used a slightly modified version of the example script from Spreadsheet::ParseExcel, replacing the "print" with a database insert.

      When processing some data from a spreadsheet into a PostgreSQL database, cells with symbols such as 0xae (ascii 92, the "registered" symbol, ®), I constantly came up against the database error:
      DBD::Pg::db do failed: ERROR: invalid byte sequence for encoding "UTF +8": 0xae
      After setting the client encoding to latin1 (keeping the database at UTF8):
      $dbh->do("set client_encoding to latin1");

      the data went in OK.

      If there is a different/better way to process this, I'd be interested to know.

      Update: there is a better way...
      #!/usr/bin/perl use warnings; use strict; use Spreadsheet::ParseExcel; my $parser = Spreadsheet::ParseExcel->new(); my $workbook = $parser->Parse('Book1.xls'); binmode(STDOUT, ":utf8"); foreach my $worksheet ( $workbook->worksheets() ) { my ( $row_min, $row_max ) = $worksheet->row_range(); my ( $col_min, $col_max ) = $worksheet->col_range(); foreach my $row ( 1 .. $row_max ) { foreach my $col ( $col_min .. $col_max ) { my $cell = $worksheet->get_cell( $row, $col ); next unless $cell; next unless defined($col_mapping{$col}); my $value = $cell->value(); utf8::upgrade($value); ... store_in_database($value); ... } } }

      rdfield