http://qs321.pair.com?node_id=715461


in reply to Re: utf8 characters in tr/// or s///
in thread utf8 characters in tr/// or s///

When you fetch utf8 texrt from mysql, you should always run it through Encode::decode("utf8",...) -- update: or equivalent, as shown by ikegami

When using a fairly recent version of DBD::mysql, you can use the mysql_enable_utf8 option. Or, to quote:

This attribute determines whether DBD::mysql should assume strings stored in the database are utf8. This feature defaults to off.

When set, a data retrieved from a textual column type (char, varchar, etc) will have the UTF-8 flag turned on if necessary. This enables character semantics on that string. You will also need to ensure that your database / table / column is configured to use UTF8. See Chapter 10 of the mysql manual for details.

Additionally, turning on this flag tells MySQL that incoming data should be treated as UTF-8. This will only take effect if used as part of the call to connect(). If you turn the flag on after connecting, you will need to issue the command SET NAMES utf8 to get the same effect.

This option is experimental and may change in future versions.

and yes, this is experimental, but seemed to work fairly stable in my tests.

--
b10m

Replies are listed 'Best First'.
Re^3: utf8 characters in tr/// or s///
by graff (Chancellor) on Oct 06, 2008 at 02:31 UTC
    This is good to know -- thanks++!

    Based on the description, it sounds like it may be a while before this sort of facility becomes "normal", to the extent that folks would find transitioning to it to be easier than staying with the older approach.

    The situation reminds me of a Larry Wall quote (in the perlunicode mail list, wouldn't you know) -- this was four years ago, but it still resonates:

    Perl's always been about providing reasonable defaults, and will continue to do so. But changing what's reasonable is tricky, and sometimes you have to go through a period in which nothing can be considered reasonable.