Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

unicode lc uc fc wtf

by Anonymous Monk
on Sep 17, 2018 at 00:55 UTC ( [id://1222482]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

perldoc -f fc says using lc to compare strings is "Wrong!" and that uc is "Also wrong!" and that we should use "fc" or a regex. Comparing lc, uc and fc shows that only lc actually works, while uc fails, and fc doesn't even exist. What's going on?

perl -le 'print $]'
5.028000

perl -CS -le 'print    "\x{1e9e}"'
ẞ

perl -CS -le 'print lc "\x{1e9e}"'
ß

perl -CS -le 'print    "\x{00df}"'
ß

perl -CS -le 'print uc "\x{00df}"'
ß

perl -CS -le 'print lc "\x{1e9e}" eq    "\x{00df}" ? 1 : 0;'
1

perl -CS -le 'print    "\x{1e9e}" eq uc "\x{00df}" ? 1 : 0;'
0

perl -CS -le 'print lc "\x{1e9e}" eq lc "\x{00df}" ? 1 : 0;'
1

perl -CS -le 'print uc "\x{1e9e}" eq uc "\x{00df}" ? 1 : 0;'
0


perl -CS -le 'print fc "\x{1e9e}" eq fc "\x{00df}" ? 1 : 0;' String found where operator expected at -e line 1, near "fc "\x{00df}"" (Do you need to predeclare fc?) syntax error at -e line 1, near "fc "\x{00df}"" Execution of -e aborted due to compilation errors.
perl -CS -le 'print fc("\x{1e9e}") eq fc("\x{00df}") ? 1 : 0;' Undefined subroutine &main::fc called at -e line 1.
perldoc -f fc fc EXPR fc Returns the casefolded version of EXPR. This is the internal function implementing the "\F" escape in double-quoted strings. Casefolding is the process of mapping strings to a form where case differences are erased; comparing two strings in their casefolded form is effectively a way of asking if two strings are equal, regardless of case. Roughly, if you ever found yourself writing this lc($this) eq lc($that) # Wrong! # or uc($this) eq uc($that) # Also wrong! # or $this =~ /^\Q$that\E\z/i # Right! Now you can write fc($this) eq fc($that) And get the correct results.

Replies are listed 'Best First'.
Re: unicode lc uc fc wtf
by Your Mother (Archbishop) on Sep 17, 2018 at 01:57 UTC

    On a related note, which I consider required reading–

    • Code that assumes roundtrip equality on casefolding, like lc(uc($s)) eq $s or uc(lc($s)) eq $s, is completely broken and wrong. Consider that the uc("σ") and uc("ς") are both "Σ", but lc("Σ") cannot possibly return both of those.
    • Code that assumes every lowercase code point has a distinct uppercase one, or vice versa, is broken. For example, "ª" is a lowercase letter with no uppercase; whereas both "ᵃ" and "ᴬ" are letters, but they are not lowercase letters; however, they are both lowercase code points without corresponding uppercase versions. Got that? They are not \p{Lowercase_Letter}, despite being both \p{Letter} and \p{Lowercase}.
    • Code that assumes changing the case doesn’t change the length of the string is broken.
      There is also Tom Christiansen's Unicode Recipes which could save your life one day, or leave you a quivering mess hiding under your desk at the mere mention of Unicode.
Re: unicode lc uc fc wtf
by choroba (Cardinal) on Sep 17, 2018 at 01:00 UTC
    Your output of perldoc is missing
    This keyword is available only when the "fc" feature is enabled, or when prefixed with "CORE::"; See feature. Alternately, include a "use v5.16" or later to the current scope.

    and therefore, your oneliners are missing

    -Mfeature=fc

    This would output 0 on the problematic line. To get 1, you need to also add unicode_strings (or just change -e to -E to get both in recent Perls).

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      Thank you! I forgot to scroll and didn't see that last part of perldoc...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1222482]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2024-03-28 22:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found