Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Regex For Removing Emoji

by Corion (Patriarch)
on Nov 12, 2016 at 14:12 UTC ( [id://1175788]=note: print w/replies, xml ) Need Help??


in reply to Regex For Removing Emoji

Also see Text::Unidecode and especially for sanitizing titles for URLs, Text::CleanFragment.

Both err rather on the side of leaving things out rather than keeping things in.

It seems your regular expressions attempt to remove whole Unicode character planes. Personally, I would explicitly allow some character planes or look at the unicode properties (maybe via Unicode::Tussle to find out whether a character is part of a script.

Also consider what you want to do with character art: (╯°□°)╯︵ ┻━┻

Replies are listed 'Best First'.
Re^2: Regex For Removing Emoji
by Beaker (Beadle) on Nov 12, 2016 at 17:02 UTC
    Thanks I will check out those modules you mentioned. I do a lot of manual text sanitization and manipulation so I should probably try and "third party" some of it.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1175788]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (2)
As of 2024-04-20 04:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found