comment on

I'm completely lost in encodings :(

I'm reading data using LWP. When the data contains umlauts, things start to get weird and I get lost.

As the whole script is already very complex, let me try to give examples first before I start thinking about an example script.

For the example I use perl debugger. <&p>

After reading my data one of the strings looks like this:

  DB<11> x $str
0  'Künzler'
  DB<12> x Encode::is_utf8($str)
0  ''
  DB<13> x length($str)
0  8
  DB<14> x substr($str,0,3)
0  'Kü'
[download]

So I think my issue already starts here as the data I read is displayed correctly but perl treats it as bytes.

I do not know what to do with that string so that perl handles it correctly.

  DB<20> x Encode::decode('utf8', $str)
0  'K?nzler'
[download]

(It was the Questionmark on a square that was displayed). That seemed wrong, but when I tested by reading from an utf-8 file opening it with '<:utf8' I got the same result, so obviously it's correct that way.

So as a test I changed my string here by decoding it as utf8.

In the next step, the string is handed to MIME::Lite::TT::HTML and by that to Text::Table. Finally it's send to me by mail.

When I look at the mail's source, the umlaut is (quoted printable) represented by '=FC' and, unfortunately, displayed as a questionmark in a square :(

I know I should have some sample code, and I will try to write some, but I hoped that in the meantime someone here with more experience already has a hint for me, where to debug further or how to fix it.

As far as I know, Text::Table requires perl strings to properly format tables. But shouldn't my string be a perl string when decoded?

Many thanks in advance.

s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

In reply to Lost in encodings by Skeeve

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Syntactic Confectionery Delight
	PerlMonks