Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Special Character Substitutions

by cei (Monk)
on Oct 04, 2001 at 03:26 UTC ( [id://116602]=perlquestion: print w/replies, xml ) Need Help??

cei has asked for the wisdom of the Perl Monks concerning the following question:

I've written a web-based content management system using CGI.pm and DBI::MySQL. A site administrator can type text into a <TEXTAREA> box and it gets saved to the database. On a schedule, it then generates JavaScript files with document.write statements that spit back out parts of the text.

The problem is, the main site administrator tends to write her content in MS Word first, then copies and pastes it into my form. Word does all sorts of fun character substitutions, such as curly-quotes, turning -- into an em-dash, changing ... into a single elipses character, etc. I suspect that some of these characters sometimes either break coming into the database (being stored as type blob) or break within the quotes of the document.write in the .js files I'm generating. (I'm escaping regular single and double quotes before puting them into the document.write).

My question is, how does Perl identify these special characters? They're not 7-bit ASCII. Should I use ord on a test sample, then work out a sustitution table on my own? Or is there a module that already handles this for me? (charnames?)

Replies are listed 'Best First'.
Re: Special Character Substitutions
by Fletch (Bishop) on Oct 04, 2001 at 04:33 UTC

    You're right, they're not ASCII. They're not Latin-1 or Unicode. Not satisfied with a standard character set, M$ came up with their own non-standard extensions. Check out the demoroniser for something that converts things back to a standard character set.

Re: Special Character Substitutions
by boo_radley (Parson) on Oct 04, 2001 at 05:38 UTC
    Dear $administrator,

    Please don't do that. Not only do you make Baby Jesus cry and warp space-time by using Microsoft Products, you also risk the chance of corrupting data in the relevant database and preventing the javascript generator from working properly. Normally, I'd just route around your actions by using a program called Demoronizer, but then I realized that you are a technically capable person, and can either set MS-Word to save as plain text, or use an alternate editor for this function.
    Should this not prove to be the case, we can always work out an alternative where I have to mistrust your work.

    Kindest regards,
    $name

    bad mood? why do you ask?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://116602]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2024-04-26 06:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found