http://qs321.pair.com?node_id=812752


in reply to Re^5: Wierd behaviour with HTML::Entities::decode_entities()
in thread Wierd behaviour with HTML::Entities::decode_entities()

If HTML::Entities were to decode " to ", it would be buggy. I did understand you correctly. Does what I said make more sense now?

I wonder if decode_entities (not decode_entities_old) handles it correctly?

I don't know if that's legal in SGML/HTML — unescaped ampersand — but yes.

$ perl -MHTML::Entities -le'print decode_entities """' "

Replies are listed 'Best First'.
Re^7: Wierd behaviour with HTML::Entities::decode_entities()
by Baz (Friar) on Dec 14, 2009 at 19:05 UTC
    You can test with this -
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w +3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sv" lang="sv"> <head> <meta http-equiv="Content-Type" content="text/html;charset=iso-885 +9-1"/> </head> <body> Mic i vĀr replokal &amp;quot;The Dungeon&amp;quot; </body> </html>
    Now it very clear what the problem is...

      If you get

      Mic i vĀr replokal &amp;quot;The Dungeon&amp;quot;

      and you're expecting

      Mic i vĀr replokal "The Dungeon"

      the problem is on their end. As you can see by loading that page in a browser,

      Mic i vĀr replokal &amp;quot;The Dungeon&amp;quot;

      means

      Mic i vĀr replokal &quot;The Dungeon&quot;