I'm at a loss. Whenever I try to handle unicode/utf8 stuff in perl I hit a wall on how to do it in a sane way.
Please tell me that I'm missing something here.
My goals are:
Text read from stdin, written to stdout and arguments on the commandline should respect the current user locale.
Source code is in a fixed format (usually utf8)
Files/pipes should be in the format I specify.
My example script:
#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use open qw(:std :locale);
open (my $in,"-|:encoding(utf8)","echo \xc3\xb6") || die ;
my $line=<$in>;
chomp($line);
print "I read a line, that is ",length($line)," chars long.\n";
print "That line is: ",$line,"\n";
$line =~ s/ö/o/;
print "That line in ascii is: $line\n";
Let's run it:
karoshi:~>LC_CTYPE=de_DE.UTF-8 ./u8demo.pl
I read a line, that is 1 chars long.
That line is: ö
That line in ascii is: o
karoshi:~>LC_CTYPE=C ./u8demo.pl
ascii "\xC3" does not map to Unicode at ./u8demo.pl line 12.
ascii "\xB6" does not map to Unicode at ./u8demo.pl line 12.
I read a line, that is 8 chars long.
That line is: \xC3\xB6
That line in ascii is: \xC3\xB6
The second case fails horribly. I have no idea why.
If I comment the "use open" line, it (of course) fails printing the umlauts on any utf-8 terminal
karoshi:~>./u8demo.pl
I read a line, that is 1 chars long.
That line is: �
That line in ascii is: o
Is there a way to get perl to "do the right thing"?
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.