http://qs321.pair.com?node_id=645194


in reply to Re^4: PDF::Template and character encodings
in thread PDF::Template and character encodings

As per your OP, do you really get "~Aj", or is it per chance æ (which is a à á - a grave acute) ?

If that is the case, you are getting utf-8 from your database - run that data through Encode. Alternatively, try using iso10646-1 (or utf8 without the hyphen).

Using those fonts might fail since it seems likely that the strings coming from the database don't have the internal UTF8 flag set.

--shmem

_($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                              /\_¯/(q    /
----------------------------  \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

Replies are listed 'Best First'.
Re^6: PDF::Template and character encodings
by geektron (Curate) on Oct 16, 2007 at 17:15 UTC
    re OP and problematic character: the top half of that 'pipe' character looks more like a dot, which is why i thought it was a 'j'. (it's supposed to be an a-accent, not a-grave)

    setting the pdf_encoding='utf8' also blows up with the same "can't find encoding" error.

    I'm reading up on Encode, though I'm not sure if I need and encode/decode sequence or a simpler transform.

      Erm, yes, it's an a-accent (or a-acute). Fixed in previous post.

      Looks definitely like utf-8 data not passed as such. Try

      use Encode qw(from_to); ... while($r = $sth->fetchrow_hashref()) { from_to($r->{$_},"utf8","latin1") for keys %$r; }

      or such, and try with iso8859-1. How did the iso10646-1 font work?

      I have no experience with pdflib and PDF::Template (is pdflib an external library?) and there might be more settings that interfere, e.g. what is your systems default charset? Is the charset of your shell the system charset, or does it differ?

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
        no luck with the iso10646-1 font.

        the to_from is dropping off the accented character and everything after it ... Terán becomes Ter ...

        and yes, PDFlib is an external library. i suspect the problem essentially lies there ... not being too well versed in character set issues, i'm not sure what the system default is. i could force PDFlib to use another character set (by pulling in pdflib_pm and forcing a value into <code> $pdflib_pm:PDF_character_enc