I've written a web-based content management system using CGI.pm and DBI::MySQL. A site administrator can type text into a <TEXTAREA> box and it gets saved to the database. On a schedule, it then generates JavaScript files with document.write statements that spit back out parts of the text.
The problem is, the main site administrator tends to write her content in MS Word first, then copies and pastes it into my form. Word does all sorts of fun character substitutions, such as curly-quotes, turning -- into an em-dash, changing ... into a single elipses character, etc. I suspect that some of these characters sometimes either break coming into the database (being stored as type blob) or break within the quotes of the document.write in the .js files I'm generating. (I'm escaping regular single and double quotes before puting them into the document.write).
My question is, how does Perl identify these special characters? They're not 7-bit ASCII. Should I use ord on a test sample, then work out a sustitution table on my own? Or is there a module that already handles this for me? (charnames?)