http://qs321.pair.com?node_id=441430

qq has asked for the wisdom of the Perl Monks concerning the following question:

I'm maintaining a web page that organizes a list of items into ranges based on first letter: 0-9, A-E, F-H, etc.

The existing code makes no provision for non-ascii characters and silently passes by any that do not match the current character class: m/^[A-Fa-f]/. The expected input range will be latin-1, but it would be nice to have a place for other characters if they come up.

After reading this thread, and googling, the best option seems to be to use Text::Unidecode to "convert" unicode to ascii before using ascii regexes. This has the advantage of being quick, simple, and ensuring that all items will fall under some category.

But this seems like a common problem, so how have others approached it?

tia, qq

update: added regex snippet for clarity. And typos.