Keep It Simple, Stupid | |
PerlMonks |
Re: Regex For Removing Emojiby Corion (Patriarch) |
on Nov 12, 2016 at 14:12 UTC ( [id://1175788]=note: print w/replies, xml ) | Need Help?? |
Also see Text::Unidecode and especially for sanitizing titles for URLs, Text::CleanFragment. Both err rather on the side of leaving things out rather than keeping things in. It seems your regular expressions attempt to remove whole Unicode character planes. Personally, I would explicitly allow some character planes or look at the unicode properties (maybe via Unicode::Tussle to find out whether a character is part of a script. Also consider what you want to do with character art: (╯°□°)╯︵ ┻━┻
In Section
Seekers of Perl Wisdom
|
|