Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^2: Inserting UTF-8 on Mysql using DBI

by Fox (Pilgrim)
on Oct 15, 2010 at 10:53 UTC ( [id://865446]=note: print w/replies, xml ) Need Help??


in reply to Re: Inserting UTF-8 on Mysql using DBI
in thread Inserting UTF-8 on Mysql using DBI

no, the mysql client( from which I got the ????) is fine, as when I execute this insert from PHP I see the japanese characters instead.
And retrieving the contents from the mysql to the PHP give me the same results as in the client.

Replies are listed 'Best First'.
Re^3: Inserting UTF-8 on Mysql using DBI
by Corion (Patriarch) on Oct 15, 2010 at 14:10 UTC

    So what did you do to cross check that you're actually receiving and sending utf8 to/from MySQL? What data do you get in Perl when you insert a row using PHP? What data do you get in PHP when you insert a row using Perl? Try to eliminate the mysql client as a potential source of confusion - it might not output utf8 properly (I say without knowing the mysql client well) - the question marks seem to me ignored/escaped unicode characters.

      I found the problem, as it turns out, neither mysql client nor php were using utf8 after all, only perl.. ugh... one thing I still don't understand though is why the characters where being display right on the browser despite the fact the page always had the content-type utf8 header... I guess I understand charset encoding even less now..

        Browsers really like to make a "best effort" at guessing the content, even if they have to deviate from the Content-Type: text/html; charset=utf-8 header. Which is why eliminating all intermediaries and cross-checking all steps is the only approach I know that works.

        Smells like "the other" programs inserted UTF-8 byte streams that luckily came back unmodified from MySQL. So you could insert and fetch something that looked like UTF-8, even when MySQL converted the byte stream from what it thought to be ISO-8859-1 to broken UTF-8 while inserting, and back from broken UTF-8 to ISO-8859-1. A big hint for such things going wrong is that the strings have the wrong length in the database (one or two extra characters for each non-ASCII character). Have a look at the Unicode tests in DBD::ODBC, especially t/40UnicodeRoundTrip.t and t/41Unicode.t.

        The browser shows the correct characters because you told it explicitly to do so: There is a UTF-8 byte stream in the HTML resource delivered by the server, and the HTML resource (or its headers) says that it is encoded as UTF-8. It simply does not matter that the software generating the page accidentally or intentionally wrote that byte stream as what it thought to be ISO-8859-1 characters.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://865446]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (7)
As of 2024-03-28 08:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found