Hi,
Im grateful for your detailed explanation. But, i am still having problems.
If i run your code, with the micron encoded as \x{C2}\x{B5} then just using decode('utf8',$clob) seems to work. As you can see from the first set of clob/conv strings below, after the bytes stuff.
clob: 74:68:69:73:20:69:73:20:73:74:72:69:6E:67:20:77:69:74:68:20:C2:B
+5:20:69:6E:20:69:74 -- byte
conv: 74:68:69:73:20:69:73:20:73:74:72:69:6E:67:20:77:69:74:68:20:C2:B
+5:20:69:6E:20:69:74 -- utf8
unix perlio
clob: 'this is string with µ in it'
conv: 'this is string with µ in it'
unix perlio encoding(utf8) utf8
clob: 'this is string with õ in it'
conv: 'this is string with µ in it'
However if i actually type a micron into the string using Alt-0181 then i get the following output: note i turned
use diagnostics on.
clob: 74:68:69:73:20:69:73:20:73:74:72:69:6E:67:20:77:69:74:68:20:B5:
+20:69:6E:20:69:74 -- byte
conv: 74:68:69:73:20:69:73:20:73:74:72:69:6E:67:20:77:69:74:68:20:EF:B
+F:BD:20:69:6E:20:69:74 -- utf8
unix perlio
clob: 'this is string with µ in it'
Wide character in print at 742047.pl line 19 (#1)
(W utf8) Perl met a wide character (>255) when it wasn't expecting
one. This warning is by default on for I/O (like print). The eas
+iest
way to quiet this warning is simply to add the :utf8 layer to the
output, e.g. binmode STDOUT, ':utf8'. Another way to turn off the
warning is to add no warnings 'utf8'; but that is often closer to
cheating. In general, you are supposed to explicitly mark the
filehandle with an encoding, see open and perlfunc/binmode.
conv: 'this is string with � in it'
unix perlio encoding(utf8) utf8
clob: 'this is string with µ in it'
conv: 'this is string with � in it'
That last conv string is i assume your splodge? Perhaps then as no question marks are being output, this is not an encoding problem at all?
I honestly do appreciate all your time
Joe.
Eschew obfuscation, espouse eludication!
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.