http://qs321.pair.com?node_id=554900


in reply to Simplify HTML programatically

In a somewhat related vein, if you want to remove some of the special characters that MSword creates you can use the following code which is based on the excellent demoronizer code.
sub nukeMSsmarts { my $s = shift; # Map incompatible CP-1252 characters $s =~ s/\x82/,/g; $s =~ s-\x83-<em>f</em>-g; $s =~ s/\x84/,,/g; $s =~ s/\x85/.../g; $s =~ s/\x88/^/g; $s =~ s-\x89- °/°°-g; $s =~ s/\x8B/</g; $s =~ s/\x8C/Oe/g; $s =~ s/\x91/'/g; $s =~ s/\x92/'/g; $s =~ s/\x93/"/g; $s =~ s/\x94/"/g; $s =~ s/\x95/*/g; $s =~ s/\x96/-/g; $s =~ s/\x97/--/g; $s =~ s-\x98-<sup>~</sup>-g; $s =~ s-\x99-<sup>TM</sup>-g; $s =~ s/\x9B/>/g; $s =~ s/\x9C/oe/g; # Now check for any remaining untranslated characters. $s =~ s/[\x00-\x08\x10-\x1F\x80-\x9F]/*/g; return $s; }

-------------------------------------
Nothing is too wonderful to be true
-- Michael Faraday