Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Non english characters

by Ryszard (Priest)
on Oct 17, 2005 at 08:20 UTC ( [id://500674]=perlquestion: print w/replies, xml ) Need Help??

Ryszard has asked for the wisdom of the Perl Monks concerning the following question:

For the 1st time i'm attempting to print out characters with perl that are non english, to be specific, french characters, and i'm getting the dreaded "Malformed UTF-8 character" error. locale -a shows all the french locales installed ( i wont list them unless requested ).
#!/usr/bin/perl -w use strict; use utf8; print "une certaine phrase aléatoire en français\n";
The output is: une certaine phrase alatoire en franais

If someone could shed some light on how to output the correct accented characters, i would be grateful.

Replies are listed 'Best First'.
Re: Non english characters
by inman (Curate) on Oct 17, 2005 at 09:47 UTC
    Your script is written in a non-UTF8 character set (probably latin1). Your data is written in this character set. When you 'use utf8' Perl tries to interpret the data as UTF8. When it hits the 'é' then it complains since this should be a double byte character.

    Have a look at the Encode module. Try out a number of scripts and save the output to a file. If you examine the file using a hex editor you will see the double byte representations of non-ascii characters. The documentation for this module is full of references to UTF8. Also look at perluniintro and friends.

    This script takes data from STDIN and writes out the UTF8 converted data to a file.

    #! /usr/bin/perl use strict; use warnings; use Encode; open UTF8, '>utf8.txt' or die; while (<>) { my $data = encode("utf8", $_); print UTF8 "$data\n"; } close UTF8;
Re: Non english characters
by ioannis (Abbot) on Oct 17, 2005 at 10:34 UTC
    Please, store the French words into file 'french.txt'm the try this from the unix command line:
    # Set LC_ALL to fr_FR.utf8 (or whatever is the proper locale) $ export LC_ALL=fr_FR.utf8 # ensure that french words are properly is displayed $ cat french.tx # you should see perl displaying proper French $ cat french.txt | perl -Mencoding='utf8' -e 'print <STDIN>'
    If the French did not display, either the LC_ALL is not properly set, or the editor is not storing its words in utf8
Re: Non english characters
by pajout (Curate) on Oct 17, 2005 at 09:21 UTC
    Which perl version did you use?
      from the perlunicode manpage:
      "use utf8" still needed to enable UTF-8/UTF-EBCDIC in scripts As a compatibility measure, the "use utf8" pragma must be explicitly i +ncluded to enable recognition of UTF-8 in the Perl scripts themselves (in string or regu +lar expression literals, or in identifier names) on ASCII-based machines or to recognize UTF-EB +CDIC on EBCDIC-based machines. These are the only times when an explicit "use utf8" is needed. See u +tf8.
      Simply remove use utf8; from your script and you should get the output you desire :)

      --Darren

      Considered sauoq: swap pre for code

      Unconsidered g0n: no pre tags in node. May have been removed by author

      oh yeah, i forgot, that would be perl 5.8.7 :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://500674]
Approved by dorko
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2024-04-16 17:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found