Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

[SOLVED] encoding trouble

by Skeeve (Parson)
on Apr 26, 2019 at 14:33 UTC ( [id://1233018]=perlquestion: print w/replies, xml ) Need Help??

Skeeve has asked for the wisdom of the Perl Monks concerning the following question:

I wrote a script which reads an UTF8 File, extracts data, and writes out some more UTF8 files.

The strange thing is: The resulting files are broken. They seem to no longer be UTF8.

I boild the issue down to this small script which just pipes the input through:

#!/usr/bin/perl use strict; use warnings; binmode(STDIN, ':utf8'); open my $out,'>:utf8','testfile.txt'; while (<>) { print $out $_; } close $out;

Having an input file containing

trallalala

äöüÄÖÜß

And calling my script with:

./encodingtrouble.pl input-encodingtrouble

The resulting output testfile.txt looks like shwn below (There seem to be some unprintable characters between the last Ãs)

trallalala

äöüÃÃÃÃ

When I do not open '>:utf8', the output looks correct, but I'm puzlled about what's going on here. What am I doing wrong? Where is my misunderstanding?


s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

Replies are listed 'Best First'.
Re: encoding trouble
by Your Mother (Archbishop) on Apr 26, 2019 at 14:48 UTC

    Essentially, you are reencoding UTF-8 as UTF-8. This is an alternative example of what's going on with the O being the output layer–

    perl -E 'say "äöüÄÖÜß"'
    äöüÄÖÜß
    
    perl -CO -E 'say "äöüÄÖÜß"'
    äöüÃÃÃÃ
    

    Add this to your script, use open ":std", ":encoding(utf8)";

    open explains what's going on.

Re: encoding trouble
by Eily (Monsignor) on Apr 26, 2019 at 14:43 UTC

    Because @ARGV is not empty when you read with <>, the handle that is read from is not STDIN but ARGV instead.

    use open IO => ':utf8'; may work better (ARGV is only opened by <> at which point it's to late to call binmode because some data as already been read.)

    s/perlvar/ARGV/, thanks choroba

Re: encoding trouble
by karlgoethebier (Abbot) on Apr 26, 2019 at 16:29 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1233018]
Approved by davies
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-19 01:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found