The problem is indeed related to APR::Table, and is as follows:
- APR::Table is a perl interface to the underlying C code in Apache.
- The UTF8 flag (see perlunicode) which marks a string as being UTF8 encoded is internal to Perl
- There is no space in the C code to store the flag
So you generate the string with the UTF8 flag set correctly, put it into APR::Table and it loses its flag. So when you retrieve it, you are receiving the individual bytes rather than the multibye character.
The solution to this is
use Encode;
my $bytes = get_data_from_APR_TABLE();
my $characters = decode('utf8',$bytes);
This way, you reset the flag and all will be happy again.
The issue is that you need to know which strings are UTF8 before you decode them. I find it easier to make sure that everything that comes into my app all gets converted to UTF8, so I'm only ever dealing with one character set internally.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|