in reply to Re^2: Writing International Phonetic Alphabet symbols to Excel?
in thread Writing International Phonetic Alphabet symbols to Excel?
After doing some testing on this, it looks like MS uses latin1 (or something very close) internally.
I used a slightly modified version of the example script from Spreadsheet::ParseExcel, replacing the "print" with a database insert.
When processing some data from a spreadsheet into a PostgreSQL database, cells with symbols such as 0xae (ascii 92, the "registered" symbol, ®), I constantly came up against the database error:After setting the client encoding to latin1 (keeping the database at UTF8):DBD::Pg::db do failed: ERROR: invalid byte sequence for encoding "UTF +8": 0xae
$dbh->do("set client_encoding to latin1");
the data went in OK.
If there is a different/better way to process this, I'd be interested to know.
Update: there is a better way...#!/usr/bin/perl use warnings; use strict; use Spreadsheet::ParseExcel; my $parser = Spreadsheet::ParseExcel->new(); my $workbook = $parser->Parse('Book1.xls'); binmode(STDOUT, ":utf8"); foreach my $worksheet ( $workbook->worksheets() ) { my ( $row_min, $row_max ) = $worksheet->row_range(); my ( $col_min, $col_max ) = $worksheet->col_range(); foreach my $row ( 1 .. $row_max ) { foreach my $col ( $col_min .. $col_max ) { my $cell = $worksheet->get_cell( $row, $col ); next unless $cell; next unless defined($col_mapping{$col}); my $value = $cell->value(); utf8::upgrade($value); ... store_in_database($value); ... } } }
rdfield
|
---|
In Section
Seekers of Perl Wisdom