http://qs321.pair.com?node_id=1192107


in reply to printing Unicode works for some characters but not all

G'day fireblood,

For generally troubleshooting this type of problem, you need to assess the Unicode abilities of all elements involved.

Firstly, check that the code point is a valid Unicode code point with a printable character assigned to it. Note that, although the code point may be in a valid block, i.e. a range of code points, it may not be a printable character: it may be unassigned, reserved, a control character, or similar. See the "Unicode Code Charts".

Next check Perl's capabilities. If you look in the Miscellaneous section of perldoc you'll find the perldelta pages. These will tell you which version of Unicode is supported by which version of Perl. They only tell you when a new Unicode version is supported, so that can take some hunting around: check the zero subversions (5.22.0, 5.24.0, etc.) first. For your version up to the latest:

Perl versionUnicode version supported
5.22.07.0
5.24.08.0
5.26.09.0

The Unicode::UCD module (UCD = "Unicode Character Databse") can provide you with a lot of other useful information. Here's just a few examples:

Which Unicode version does your current Perl support. I'm using Perl 5.26, so it shows Unicode 9; you're using 5.22, so it should show Unicode 7.

$ perl -E 'use Unicode::UCD; say Unicode::UCD::UnicodeVersion' 9.0.0

What version of Unicode did a character first appear in (given by the "Age" property). Here's a couple: one from your post; one I happened to know was a recent addition.

$ perl -E 'use Unicode::UCD "charprop"; say charprop("U+5C0D", "Age")' V1_1 $ perl -E 'use Unicode::UCD "charprop"; say charprop("U+1F9C0", "Age") +' V8_0

If I switch to Perl 5.22, the output from that last command becomes:

$ perl -E 'use Unicode::UCD "charprop"; say charprop("U+1F9C0", "Age") +' Unassigned

Note that, in isolation, that output is indistinguishable from a code point which isn't actually assigned; however, if you did the "valid Unicode code point" check first, as suggested, you'll know the difference.

$ perl -E 'use Unicode::UCD "charprop"; say charprop("U+1E95A", "Age") +' Unassigned

[See Unicode code charts (PDF): "Supplemental Symbols and Pictographs" for U+1F9C0 (a recently added emoji which looks like a wedge of cheese); "Adlam" for U+1E95A (no special significance: Adlam was alphabetically first when searching for a block with an unassigned code point; U+1E95A just happened to be in a noticeable gap between assigned code points.]

Next, you'll need to check the Unicode support available for your operating system, the application you're using to display the characters, fonts being used and so on. I don't have those available; however, this would (as far as I know) be valid from a Cygwin command line, and may provide some insight:

$ perl -C -E 'say "\x{5c0d}"'
對
$ echo "對"
對

Note that I used <pre> tags for that last part. When showing characters outside the ASCII range, these are a better choice than <code> tags which will often just render them as entity references (e.g. &#x5C0D;).

— Ken