"be consistent" | |
PerlMonks |
Re^4: Safely removing Unicode zero-width spaces and other non-printing charactersby mldvx4 (Friar) |
on Dec 05, 2019 at 05:33 UTC ( [id://11109685]=note: print w/replies, xml ) | Need Help?? |
Yes, the RSS reads fine of course. The problem is with the pages which the RSS points to. HTML and XHTML is a hot mess. Even when a respectable CMS is used, the authors can still paste in something weird. It is looking like I may have to treat each site individually and making individual filters might not be worth the effort. However, I am hoping for an automated way to normalize incoming text.
In Section
Seekers of Perl Wisdom
|
|