http://qs321.pair.com?node_id=430763


in reply to UTF-8 and browsers - Update

Bug in Firefox. It should work as you describe.

As for the composition: first of all, work on characters, or at least or codepoints, not on utf-8 bytes. Second, you want Unicode Normal Form C (see Unicode::Normalize), so that you can write:

use Unicode::Normalize; use charnames ':full'; # this is just to make things easier in this ex +ample binmode(STDOUT,':utf8'); # this to make 'print' output utf-8 bytes my $a="O\N{COMBINING DIAERESIS}"; my $b=NFC($a); print length($a),$a,"\n"; print length($b),$b,"\n";

Will print:

2Ö 1

(more or less, depending on PM's escaping mechanisms)

-- 
        dakkar - Mobilis in mobile

Most of my code is tested...

Perl is strongly typed, it just has very few types (Dan)