This is just a follow-up to this decade-old thread -- I was able to get my CGI.pm scripts to work well with UTF-8 input by using the current version of CGI.pm, which is able to automatically decode the incoming "param" data, assuming it was encoded as UTF-8 when it was sent to the script, by using a '-utf8' pragma. So, instead of
use CGI;
one uses:
use CGI ('-utf8');
That pragma seems to have eliminated the need for the "as_utf8" modification discussed in this thread long ago, for scripts using CGI::Application. I have quite a few CGI::Application scripts still running, so I needed a way to pass the -utf8 pragma to CGI.pm (used internally by CGI::Application) without changing the CGI::Application module's code. The solution was to add the following subroutine to each application module. It overrides the cgiapp_get_query method in the CGI::Application parent.
sub cgiapp_get_query {
my $self = shift;
use CGI ('-utf8');
my $q = CGI->new;
$q->charset('UTF-8');
return $q;
}
The "$q->charset('UTF-8')" line is another matter. It isn't part of the automatic decoding of the param data. It's affecting the output. It causes CGI::Application to modify the header that's automatically generated at the end of a runmode just before the content is displayed, i.e., it becomes Content-type: text/html; charset=UTF-8
I don't believe the charset method was part of CGI.pm back in the day, so maybe the "$q->charset('UTF-8')" doesn't do anything that isn't alternatively done by the technique in the code submitted by the OP, namely including this line in the "sub setup { ... }" in the application module:
$self->header_add( -charset => 'utf-8' );
To print a UTF-8 encoded web page requires that I actually encode the native Perl character format into UTF-8 by having this line be somewhere towards the top of the application module (i.e., not inside a subroutine):
binmode STDOUT, ':encoding(UTF-8)'; Or, instead of a binmode statement I can add
use Encode;
to the application module and then say, at the end of each runmode:
return Encode::encode( 'UTF-8', $template->output );
rather than merely saying
return $template->output;.
The "use utf8;" included in the code submitted by the OP isn't needed for the automatic decoding of the incoming param data or the encoding of the output. |