Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: ( PDF::EasyPDF ) encoding problem

by almut (Canon)
on Sep 07, 2009 at 12:59 UTC ( #793944=note: print w/replies, xml ) Need Help??

in reply to ( PDF::EasyPDF ) encoding problem

From a quick look at the PDF::EasyPDF source, I'd say it doesn't have any support for unicode at all...

So, try to Encode::encode() your $string in IsoLatin1 (aka "iso-8859-1") before you pass it to the ->text() method.  Also, you'd need use utf8; if you have literal strings in your source code (like in your example) and are using a Unicode editor (which I suppose you are, otherwise you wouldn't be getting the results you're currently seeing...). This is required to tell Perl that the source is in UTF-8.

(Of course this approach would only work for characters that are actually encodable in IsoLatin1, like ""...)

Replies are listed 'Best First'.
Re^2: ( PDF::EasyPDF ) encoding problem
by lepetitalbert (Abbot) on Sep 07, 2009 at 19:49 UTC


    You're right moritz, I've already read this article several times but I still cannot say I'm really comfortable with this stuff.

    I thought it was some package limitation, as I usually don't have problems with this ( french speaking area here ), but couldn't confirm it.

    I tried your and almut's solution but no change.

    Thanks and have a nice day.

    "There is only one good, namely knowledge, and only one evil, namely ignorance." Socrates

      I just had a closer look at the module...  Part of the problem is that PDF::EasyPDF specifies the encoding for its 14 Adobe Base Fonts (all it supports) as "MacRomanEncoding" — which is kind of unfortunate, as this encoding is rather different from ISO-8859-1, which Perl defaults to in most cases (for example, "" is 0x8E in Mac Roman, while it's 0xE9 in ISO-Latin-1 and CP1252).  In other words, even if you had successfully solved the UTF-8 to ISO-8859-1 conversion issue, it still wouldn't work...

      But you can get it working with two small changes to  (tested, i.e. works for me):

      1. Replace all 14 occurrences of "/MacRomanEncoding" with "/WinAnsiEncoding" (case is important). Windows ANSI encoding (CP1252) is roughly the equivalent of ISO-8859-1/ISO-8859-15  (one of the differences is that the Euro symbol is in a different position (0xA4 in ISO-8859-15, 0x80 in CP1252).

      2. Change this line

        open (EASYPDF,">$self->{file}") or die ...

        to read

        open (EASYPDF, ">:encoding(cp1252)", $self->{file}) or die ...

        (and don't forget to put use utf8; in your script)

      Update: alternatively, you could leave PDF::EasyPDF's /MacRomanEncoding declarations in place, and have Perl convert to that encoding directly (">:encoding(MacRoman)"), which would work, too (except for the Euro symbol)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://793944]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (5)
As of 2021-04-11 10:12 GMT
Find Nodes?
    Voting Booth?

    No recent polls found