Perl-Sensitive Sunglasses | |
PerlMonks |
Text Encoding on this site's HTMLby John M. Dlugosz (Monsignor) |
on Dec 24, 2002 at 02:57 UTC ( [id://222023]=monkdiscuss: print w/replies, xml ) | Need Help?? |
The pages on this site are marked as being the Latin-1 character set. Increasingly, though, we are seeing UTF-8 being pasted into code listings. The <code> blocks are immune from & expansion by design, so you can't just code HTML entities for funny chars. So... why can't this site do it for us? We could have a <code utf-8> block and a <code Windows> block, etc. The display formatting logic would always turn chars beyond basic ASCII into named entities or Unicode entities, so it displays properly regardless of the browser's setting (or, convert to match what the page's carset is stated to be for characters in that character set). A variation would be to have some other attribute mark in the opening <code> tag to indicate that some escape character is used in the code block, so we could write such things if we wanted to. I think a smart default would work, too. If a code block contains characters that are beyond 127 and are legal UTF-8 encodings, it could assume (by default) that it is in fact UTF-8 and convert them to entities. If that's not correct, it would show in the preview window. Getting it wrong is no worse than the current situation with forgetting to escape out square brackets. I think changing the sent HTML to UTF-8 is not a solution, since we would continue to have both 8-bit characters and UTF-8 pasted into input fields. The solution is to allow either for input.
Back to
Perl Monks Discussion
|
|