So what did you do to cross check that you're actually receiving and sending utf8 to/from MySQL? What data do you get in Perl when you insert a row using PHP? What data do you get in PHP when you insert a row using Perl? Try to eliminate the mysql client as a potential source of confusion - it might not output utf8 properly (I say without knowing the mysql client well) - the question marks seem to me ignored/escaped unicode characters.
| [reply] [Watch: Dir/Any] |
I found the problem, as it turns out, neither mysql client nor php were using utf8 after all, only perl.. ugh... one thing I still don't understand though is why the characters where being display right on the browser despite the fact the page always had the content-type utf8 header... I guess I understand charset encoding even less now..
| [reply] [Watch: Dir/Any] |
| [reply] [Watch: Dir/Any] [d/l] |
Smells like "the other" programs inserted UTF-8 byte streams that luckily came back unmodified from MySQL. So you could insert and fetch something that looked like UTF-8, even when MySQL converted the byte stream from what it thought to be ISO-8859-1 to broken UTF-8 while inserting, and back from broken UTF-8 to ISO-8859-1. A big hint for such things going wrong is that the strings have the wrong length in the database (one or two extra characters for each non-ASCII character). Have a look at the Unicode tests in DBD::ODBC, especially t/40UnicodeRoundTrip.t and t/41Unicode.t.
The browser shows the correct characters because you told it explicitly to do so: There is a UTF-8 byte stream in the HTML resource delivered by the server, and the HTML resource (or its headers) says that it is encoded as UTF-8. It simply does not matter that the software generating the page accidentally or intentionally wrote that byte stream as what it thought to be ISO-8859-1 characters.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [Watch: Dir/Any] [d/l] [select] |