http://qs321.pair.com?node_id=764505

ambs has asked for the wisdom of the Perl Monks concerning the following question:

Hello

I am almost sure I had this script working already. But yesterday I noted that when I call it with a URL with unicode characters (say, dic.pl?word=coração), I can't get this string correctly.

I am using 'use locale;' for Portuguese, I am using 'use utf8;' because the script has unicode characters, I am using 'binmode(STDIN,":utf8");' and 'binmode(STDOUT,":utf8");', and finally, I am calling the hader function for CGI with '-charset=>"utf-8"'.

Added as well the META tag in the HTML, and checked that apache doesn't have a default charset.

But the string appears in latin1 (with two different bytes for each unicode character.

Thanks for any hint.

Hack: noted that if I use Encode::_set_utf8 to the parameter, all seems to work. But it is a hack.

Alberto Simões

Replies are listed 'Best First'.
Re: CGI.pm and URL parameters
by Corion (Patriarch) on May 17, 2009 at 13:01 UTC

    Your script still depends on what input encoding the browser is sending the form values in. You will need to trace the whole path between what the browser sends and what it tells you the encoding is, through how you process that data, to the output. Possibly, you have to add the encoding not only to a header but also as a META tag in the generated page.

      Unfortunately that doesn't change a thing. If I just remembered what changed in the machine...

      Alberto Simões

        Have you done this as the first communication to the browser?

        print "Content-type: text/html; charset=utf-8\n\n";

        IE requires this. Firefox will respect the <meta> tag directions, but IE needs to hear of UTF in the content headers.

        Blessings,

        ~Polyglot~

        What do you mean by "that"? Did you need to trace the whole path?