Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^4: Inserting UTF-8 on Mysql using DBI

by Fox (Pilgrim)
on Oct 16, 2010 at 12:48 UTC ( [id://865658]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Inserting UTF-8 on Mysql using DBI
in thread Inserting UTF-8 on Mysql using DBI

I found the problem, as it turns out, neither mysql client nor php were using utf8 after all, only perl.. ugh... one thing I still don't understand though is why the characters where being display right on the browser despite the fact the page always had the content-type utf8 header... I guess I understand charset encoding even less now..
  • Comment on Re^4: Inserting UTF-8 on Mysql using DBI

Replies are listed 'Best First'.
Re^5: Inserting UTF-8 on Mysql using DBI
by Corion (Patriarch) on Oct 16, 2010 at 17:10 UTC

    Browsers really like to make a "best effort" at guessing the content, even if they have to deviate from the Content-Type: text/html; charset=utf-8 header. Which is why eliminating all intermediaries and cross-checking all steps is the only approach I know that works.

Re^5: Inserting UTF-8 on Mysql using DBI
by afoken (Chancellor) on Oct 18, 2010 at 14:43 UTC

    Smells like "the other" programs inserted UTF-8 byte streams that luckily came back unmodified from MySQL. So you could insert and fetch something that looked like UTF-8, even when MySQL converted the byte stream from what it thought to be ISO-8859-1 to broken UTF-8 while inserting, and back from broken UTF-8 to ISO-8859-1. A big hint for such things going wrong is that the strings have the wrong length in the database (one or two extra characters for each non-ASCII character). Have a look at the Unicode tests in DBD::ODBC, especially t/40UnicodeRoundTrip.t and t/41Unicode.t.

    The browser shows the correct characters because you told it explicitly to do so: There is a UTF-8 byte stream in the HTML resource delivered by the server, and the HTML resource (or its headers) says that it is encoded as UTF-8. It simply does not matter that the software generating the page accidentally or intentionally wrote that byte stream as what it thought to be ISO-8859-1 characters.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://865658]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-04-18 06:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found