Clear questions and runnable code get the best and fastest answer |
|
PerlMonks |
strftime does not handle Unicode characters in format argument properly (at least, not consistently)by Bruder Savigny (Initiate) |
on Sep 21, 2020 at 18:36 UTC ( [id://11122019]=perlquestion: print w/replies, xml ) | Need Help?? |
Bruder Savigny has asked for the wisdom of the Perl Monks concerning the following question: I tried to use a UTF-8 non-breaking space (between day and name of month) in the format argument of POSIX::strftime, and hit (with Perl v5.32.0 and a UTF-8-encoded script file, without any non-default encoding settings) upon the following two oddities:
These behaviours can be demonstrated with the following script (The comments apply to the transparent space character in the format; the innocent-looking - inner, i.e. not syntactical - quotes in lines 3 and 4 are Unicode LEFT and RIGHT SINGLE QUOTATION MARK, the same as in the $string):
This outputs (line numbers added):
Note that
(I have deleted complaints about the wide characters in print for line 3 and 4 for brevity.) I am guessing, rather vaguely, that this is down to strftime essentially being the C function and the latter not being Unicode-aware and maybe also the way that Perl identifies how strings are encoded and then "upgrades" some so as to harmonise their encodings (in this case under a wrong assumption), but ... : The behaviour with a non-breaking space alone vs. (also) other non-ASCII characters seems definitely inconsistent. Why is the behaviour different between the non-breaking space and typographical quotation marks, which are all outside the ASCII block? Also, can anything be done about it, i.e. is it possible to use non-breaking spaces in a format for strftime such that they come out correctly (and without having to resort to inserting extra - likely unwanted - non-ASCII characters), and is it possible to use any non-ASCII character in those format argument without confusing Perl? (Actually, I can think only of non-breaking spaces as useful, but other cultures may very plausibly have other use cases.)
Back to
Seekers of Perl Wisdom
|
|