http://qs321.pair.com?node_id=1192107


in reply to printing Unicode works for some characters but not all

G'day fireblood,

For generally troubleshooting this type of problem, you need to assess the Unicode abilities of all elements involved.

Firstly, check that the code point is a valid Unicode code point with a printable character assigned to it. Note that, although the code point may be in a valid block, i.e. a range of code points, it may not be a printable character: it may be unassigned, reserved, a control character, or similar. See the "Unicode Code Charts".

Next check Perl's capabilities. If you look in the Miscellaneous section of perldoc you'll find the perldelta pages. These will tell you which version of Unicode is supported by which version of Perl. They only tell you when a new Unicode version is supported, so that can take some hunting around: check the zero subversions (5.22.0, 5.24.0, etc.) first. For your version up to the latest:

Perl versionUnicode version supported
5.22.07.0
5.24.08.0
5.26.09.0

The Unicode::UCD module (UCD = "Unicode Character Databse") can provide you with a lot of other useful information. Here's just a few examples:

Which Unicode version does your current Perl support. I'm using Perl 5.26, so it shows Unicode 9; you're using 5.22, so it should show Unicode 7.

$ perl -E 'use Unicode::UCD; say Unicode::UCD::UnicodeVersion' 9.0.0

What version of Unicode did a character first appear in (given by the "Age" property). Here's a couple: one from your post; one I happened to know was a recent addition.

$ perl -E 'use Unicode::UCD "charprop"; say charprop("U+5C0D", "Age")' V1_1 $ perl -E 'use Unicode::UCD "charprop"; say charprop("U+1F9C0", "Age") +' V8_0

If I switch to Perl 5.22, the output from that last command becomes:

$ perl -E 'use Unicode::UCD "charprop"; say charprop("U+1F9C0", "Age") +' Unassigned

Note that, in isolation, that output is indistinguishable from a code point which isn't actually assigned; however, if you did the "valid Unicode code point" check first, as suggested, you'll know the difference.

$ perl -E 'use Unicode::UCD "charprop"; say charprop("U+1E95A", "Age") +' Unassigned

[See Unicode code charts (PDF): "Supplemental Symbols and Pictographs" for U+1F9C0 (a recently added emoji which looks like a wedge of cheese); "Adlam" for U+1E95A (no special significance: Adlam was alphabetically first when searching for a block with an unassigned code point; U+1E95A just happened to be in a noticeable gap between assigned code points.]

Next, you'll need to check the Unicode support available for your operating system, the application you're using to display the characters, fonts being used and so on. I don't have those available; however, this would (as far as I know) be valid from a Cygwin command line, and may provide some insight:

$ perl -C -E 'say "\x{5c0d}"'
對
$ echo "對"
對

Note that I used <pre> tags for that last part. When showing characters outside the ASCII range, these are a better choice than <code> tags which will often just render them as entity references (e.g. &#x5C0D;).

— Ken

Replies are listed 'Best First'.
Re^2: printing Unicode works for some characters but not all
by talexb (Chancellor) on Jun 05, 2017 at 21:03 UTC

    Was just investigating Unicode today, which was suggested by reading up on the new stuff in 5.26 .. and this post helped explain a number of questions. Great, great answer. Thank you.

    Alex / talexb / Toronto

    Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re^2: printing Unicode works for some characters but not all
by fireblood (Scribe) on Jun 10, 2017 at 21:31 UTC
    Hi Ken,

    Wow, your answer is so complete and detailed, you put a lot of effort into it. I appreciate your answer very much, it gives me a much deeper understanding of all of the factors that are involved in determining whether or not any given Unicode character can be displayed.

    I will apply your wisdom to my current project, and will upgrade to 5.26 as well. I didn't know that 5.26 was available already.

    Thanks again,
    Richard