Re: utf weirdness in regex

in reply to utf weirdness in regex

Using decode here is very wrong. Decode is if you have a sequence that is in utf8, but perl does not know it. Your's is in latin1 and it does not convert to valid utf8. retry it with

$string1 = Encode::decode(utf8 => $string1, Encode::FB_CROAK);
[download]

to convert all to valid unicode, try:

$string1 = Encode::decode(latin1 => $string1, Encode::FB_CROAK);
$string2 = Encode::decode(latin1 => $string2, Encode::FB_CROAK);
$string3 = Encode::decode(latin1 => $string3, Encode::FB_CROAK);
[download]

Boris

Comment on Re: utf weirdness in regex Select or Download Code

Replies are listed 'Best First'.
Re^2: utf weirdness in regex by december (Pilgrim) on Jul 24, 2004 at 04:35 UTC
Thanks, that looks a lot more like what I expected! In the future, I will use the CHECK argument to see if something went wrong in the conversion. I hope that will lift some of my initial confusion as to what is in which charset...	[reply]
Re^3: utf weirdness in regex by graff (Chancellor) on Jul 24, 2004 at 08:43 UTC
Of course, when using FB_CROAK as the CHECK argument, you normally want to wrap it in an eval: `my $encoding = "whatever"; my $octets = "characters in whatever encoding..."; eval '$_ = decode( $encoding, $octets, Encode::FB_CROAK )'; if( $@ ) { report_an_error(); ... }` [download]	[reply] [d/l]

In Section Seekers of Perl Wisdom