http://qs321.pair.com?node_id=661519


in reply to How to reverse a (Unicode) string

print scalar reverse "\noäu";

If you entered this using an UTF-8 editor, you forgot to "use utf8;" to notify Perl of this fact.

You may be dealing with the string "\no\x{C3}\x{A4}u" instead of the intended "\no\x{e4}u"!

reverse Works on bytes

reverse works on characters. If you have a bytestring, every character represents the equivalent byte. If you have a Unicode text string, reverse properly reverses based on unicode codepoints.

You can solve this problem by decoding the text strings

This suggests that decoding is a workaround. It is not, it is something you should always do when dealing with text data!

The use utf8; takes care that every string literal in the script is treated as a text string

Perl has no idea, and cannot be told, what kind your strings are: binary or text. Without "use utf8" you don't necessarily have byte strings, but if you have text strings, they're interpreted as iso-8859-1 rather than utf-8. Note that iso-8859-1 is a unicode encoding -- it just doesn't support all of the characters.

The rest of your post is accurate, but I wanted to respond to avoid that newbies get a negative feeling about Perl's unicode support from your post. Perl's unicode support is great, but the programmer MUST learn the difference between unicode and utf-8, and the difference between text data and binary data.