Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re^4: How to reverse a (Unicode) string

by ikegami (Pope)
on Jan 09, 2011 at 23:52 UTC ( #881386=note: print w/replies, xml ) Need Help??

in reply to Re^3: How to reverse a (Unicode) string
in thread How to reverse a (Unicode) string

[iso-8859-1] is a unicode encoding, in that after you've decoded the character number, the number maps 1-on-1 to the Unicode space.

By that logic, UTF-8 is not a "unicode encoding". For example, C2 in Unicode does not map to C2 in UTF-8. Your choice of name for this trait is very poor.

  • Comment on Re^4: How to reverse a (Unicode) string

Replies are listed 'Best First'.
Re^5: How to reverse a (Unicode) string
by JavaFan (Canon) on Jan 10, 2011 at 09:17 UTC
    You seem to be confusing "1-to-1" mapping, and "identity function". While the identity function is a trivial "1-to-1" mapping, it's not true every "1-to-1" mapping is the identity function.

    However, even side-stepping that, Juerd doesn't mean byte values map 1-to-1. The mapping is after decoding. For instance, the UTF-8 byte sequence 0x82 0xC3 decodes to C2. Which indeed does map to the C2 Unicode code point.

      In that case, we're back to the original question. Are there any encodings aren't "Unicode encodings"?

      (Strictly speaking, the mapping isn't 1-to-1. U+2660 can't be encoded in iso-8859-1. You could also say that both U+00E9 and U+0065 U+0301 encode to E9 in iso-8859-1, although Encode's encode doesn't handle that.)

        Strictly speaking, the mapping isn't 1-to-1. U+2660 can't be encoded in iso-8859-1
        The claim is that iso-8859-1 maps 1-to-1 to Unicode, not that Unicode maps 1-to-1 to iso-8859-1. A 1-to-1 mapping is also known as an injection. The claim wasn't that it's a bijection (aka 1-to-1 correspondence).
      No, actually, I'm not confused. When the term was introduced, it was given as the reason iso-8859-1 works without being decoded, so he indeed meant an identity mapping.
        You have to always decode. Note that Unicode is a list of integers with a meaning. iso-8859-1 is an encoding (of a subset of Unicode). UTF-8 is also an encoding. UTF-16 is another. It just happens that for the first 128 code points, the encoding in iso-8859-1 and UTF-8 are identical. But that wasn't part of Juerds claim.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://881386]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (7)
As of 2020-09-30 09:48 GMT
Find Nodes?
    Voting Booth?
    If at first I donít succeed, I Ö

    Results (160 votes). Check out past polls.