http://qs321.pair.com?node_id=596686


in reply to Re: clean html tags
in thread clean html tags

"'" => "'",

The apos entity is an XML built it, and isn't defined for HTML. While some browsers support it in text/html documents, this is error correction and you should not use it.

It's best to escape the data as it's coming in; otherwise it's very difficult to distinguish between, for example, a less-than sign that should be converted to < and one that is part of the markup.

My preference is to convert from text to HTML at the last minute to avoid issues where I need to manipulate the data in Perl. (Template::Stash::EscapeHTML is quite cool).

What matters though is doing it in one place, so its easy to spot when you forget to protect a bit of user input from XSS et al.

Replies are listed 'Best First'.
Re^3: clean html tags
by sgifford (Prior) on Jan 26, 2007 at 19:30 UTC
    The apos entity is an XML built it, and isn't defined for HTML. While some browsers support it in text/html documents, this is error correction and you should not use it.
    Ah, that's interesting. I find it very useful to ensure that user-generated text doesn't break out of an HTML or JavaScript string, which is a big win IMHO. For example, if a template says:
    <img src='$IMAGE1' alt='$DESCRIPTION1'>
    I can be sure that $IMAGE1 and $DESCRIPTION1 won't mess up my HTML formatting if I can ensure it doesn't have apostrophes, but otherwise it's impossible.

    Are you aware of any browsers that don't support this entity in HTML?

      Ah, that's interesting. I find it very useful to ensure that user-generated text doesn't break out of an HTML or JavaScript string

      You get the same effect if you use the numeric character reference as described in the document I previously linked to, or avoid delimiting attribute values with single quotes and use the more conventional double quotes.

      Are you aware of any browsers that don't support this entity in HTML?

      Not off the top of my head, but using it in text/html is non-standard, and its easy to avoid.

        To follow up: I ignored dorward's advice and left this in, and it turns out it doesn't work well in some little browser called "Internet Explorer," which apparently some people like to use. :-)

        Changing &apos; to &39; fixed the problem, as he suggested it would.