I have a great deal of text that has both endashes and emdashes (– and — respectively) within html files as plain text. Since my editor gladly converts this (nary a complaint) I usually don't pay any attention. However I recently noticed a problem with
HTML::Entities encode_entities function; i.e.
encode_entities("How the Chimney–sweeper's cry,")
produces:
How the Chimney–sweeper's cry,
rather than:
How the Chimney”sweeper's cry,
Now that I've spotted the problem, I can easily do the necessary regex massage and have it go away, but I was wondering if anyone knows the necessary Unicode/UTF-8 incantation magic to avoid the problem in the first place (if in fact that is what is)? Note that the emdash is translated to — instead of „ I have not checked the other typical HTML typographical elements as yet, these are so common that the problem surfaced fairly quickly.
Note:I leave the typos as written, but I really meant — and – *sigh*
Note: https://stackoverflow.com/questions/631406/what-is-the-difference-between-em-dash-151-and-8212 seems pertainent...
--hsm
"Never try to teach a pig to sing...it wastes your time and it annoys the pig."