Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: DWIM with non ASCII characters

by moritz (Cardinal)
on May 07, 2010 at 07:17 UTC ( [id://838873]=note: print w/replies, xml ) Need Help??


in reply to DWIM with non ASCII characters

What do you think is the best strategy for handling non ASCII characters?

Decode everything that comes from the outside. Encode everything that leaves your program. use utf8;. Avoid locales if you can.

See Perl, encodings and Unicode and the Perl Programming/Unicode UTF-8 WikiBook.

Perl 6 - links to (nearly) everything that is Perl 6.

Replies are listed 'Best First'.
Re^2: DWIM with non ASCII characters
by Hue-Bond (Priest) on May 07, 2010 at 07:40 UTC
    Decode everything that comes from the outside. Encode everything that leaves your program. use utf8;

    Why use utf8;? As I understand the documentation, its purpose is to enable the source code to be in UTF-8 (so you can do e.g. my $ñ = 'foo'; where 'ñ' is not a single byte). It even says "Do not use this pragma for anything else than telling Perl that your script is written in UTF-8".

    I thought the preferred way to decode/encode the program's input/output was by using Encode.

    --
     David Serrano
     (Please treat my english text just like Perl code, i.e. feel free to notify me of any syntax, grammar, style and/or spelling errors. Thank you!).

      Why use utf8;? As I understand the documentation, its purpose is to enable the source code to be in UTF-8

      Yes, that way you avoid concatenating decoded and non-decoded strings.

      Of course it requires your script to be actually stored in UTF-8. But since the more general solution (use encoding $your_encoding) is severly broken (wrt to AUTOLOAD, thread safety and other issues), that's currently the only sane way to store non-ASCII Perl programs.

      As for the rest, I can only agree to what ikegami wrote; using IO layers is much more convenient than using encode() and decode() on every IO operation. More importantly since there are fewer spots you have to care about encoding, the probability of forgetting it somewhere (and getting Mojibake in response) is much lower.

      Perl 6 - links to (nearly) everything that is Perl 6.

      More importantly, use utf8; allows you to do

      my $foo = 'ñ';

      So far, I've stuck to ASCII in my sources, so use utf8; wouldn't do anything for me.

      I thought the preferred way to decode/encode the program's input/output was by using Encode.

      No way. Why encode and decode everything yourself when you can let PerlIO do it. At least, that's the way I see it.

        More importantly, use utf8; allows you to do
        my $foo = 'ñ';

        Hmm, then I must have configured something in my system, since I can do that without use'ing utf8:

        $ xxd ñ.pl 0000000: 7072 696e 7420 27c3 b127 0a print '..'. $ env -i /usr/bin/perl -Mstrict -wl ñ.pl ñ

        --
         David Serrano
         (Please treat my english text just like Perl code, i.e. feel free to notify me of any syntax, grammar, style and/or spelling errors. Thank you!).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://838873]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-19 01:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found