Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re: utf weirdness in regex

by borisz (Canon)
on Jul 23, 2004 at 08:24 UTC ( #376828=note: print w/replies, xml ) Need Help??

in reply to utf weirdness in regex

Using decode here is very wrong. Decode is if you have a sequence that is in utf8, but perl does not know it. Your's is in latin1 and it does not convert to valid utf8. retry it with
$string1 = Encode::decode(utf8 => $string1, Encode::FB_CROAK);
to convert all to valid unicode, try:
$string1 = Encode::decode(latin1 => $string1, Encode::FB_CROAK); $string2 = Encode::decode(latin1 => $string2, Encode::FB_CROAK); $string3 = Encode::decode(latin1 => $string3, Encode::FB_CROAK);

Replies are listed 'Best First'.
Re^2: utf weirdness in regex
by december (Pilgrim) on Jul 24, 2004 at 04:35 UTC

    Thanks, that looks a lot more like what I expected!

    In the future, I will use the CHECK argument to see if something went wrong in the conversion. I hope that will lift some of my initial confusion as to what is in which charset...

      Of course, when using FB_CROAK as the CHECK argument, you normally want to wrap it in an eval:
      my $encoding = "whatever"; my $octets = "characters in whatever encoding..."; eval '$_ = decode( $encoding, $octets, Encode::FB_CROAK )'; if( $@ ) { report_an_error(); ... }

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://376828]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2022-05-25 13:53 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (90 votes). Check out past polls.