http://qs321.pair.com?node_id=376828


in reply to utf weirdness in regex

Using decode here is very wrong. Decode is if you have a sequence that is in utf8, but perl does not know it. Your's is in latin1 and it does not convert to valid utf8. retry it with
$string1 = Encode::decode(utf8 => $string1, Encode::FB_CROAK);
to convert all to valid unicode, try:
$string1 = Encode::decode(latin1 => $string1, Encode::FB_CROAK); $string2 = Encode::decode(latin1 => $string2, Encode::FB_CROAK); $string3 = Encode::decode(latin1 => $string3, Encode::FB_CROAK);
Boris

Replies are listed 'Best First'.
Re^2: utf weirdness in regex
by december (Pilgrim) on Jul 24, 2004 at 04:35 UTC

    Thanks, that looks a lot more like what I expected!

    In the future, I will use the CHECK argument to see if something went wrong in the conversion. I hope that will lift some of my initial confusion as to what is in which charset...

      Of course, when using FB_CROAK as the CHECK argument, you normally want to wrap it in an eval:
      my $encoding = "whatever"; my $octets = "characters in whatever encoding..."; eval '$_ = decode( $encoding, $octets, Encode::FB_CROAK )'; if( $@ ) { report_an_error(); ... }