This is a simplified high level view of relations between Perl scalar types. ┌───────────────┐ ┌───── │ REFERENCE │ ─────┐ │ │ (ROK flag on) │ │ │ └───────────────┘ │ numeric context │ │ string context │ │ ▼ ▼ ┌─────────────────────────┐ ┌──────────────────────────────┐ │ NUMBER │ string context │ TEXT STRING │ │ encoded internally │ ──────────────────▶ │ (POK flag on) │ │ as any of: │ │ encoded internally │ │ * integer (IOK flag on) │ numeric context │ as one of: │ │ * double (NOK flag on) │ ◀────────────────── │ * iso-8859-1 (UTF8 flag off) │ └─────────────────────────┘ │ * utf8 (UTF8 flag on) │ │ ▲ └──────────────────────────────┘ │ │ ▲ │ ▲ │ │ │ │ │ pack │ │ unpack decode │ │ encode │ │ │ │ │ │ ▼ │ │ ▼ │ :encoding ┌─────────────────────────────────────────┐ │ PerlIO │ BINARY STRING │ │ layer │ (POK flag on) │ ◀───────┘ │ (UTF8 flag off) │ └─────────────────────────────────────────┘ ▲ ▲ ▲ │ │ │ │ │ │ ▼ ▼ ▼ ┌─────────────────────────────┐ │ OUTSIDE PERL │ │ files, sockets, filenames, │ │ environment, system calls │ └─────────────────────────────┘ (A Perl programmer does not have to know about the internal flags ROK, IOK, NOK, POK, and UTF8, but if you're interested read perlguts.) Keep text and binary strings/semantics separated! (Good style anyway!) If you don't keep them separate, and use a binary string as a text string, it is assumed to be iso-8859-1 encoded. If you don't keep them separate, and use a text string as a binary string, one of the following things happens, with or without warnings: 1. the internal iso-8859-1 buffer is used (always the case if the internal buffer is not utf8 encoded) 2. the internal utf8 buffer is used 3. the iso-8859-1 encoded version is used 3a. characters above U+00FF are utf8 encoded, while the rest is iso 3b. characters above U+00FF are modulo'ed 256 3c. characters above U+00FF are dropped 3d. characters above U+00FF cause an exception to be thrown If you do keep them separate, and always explicitly convert between the two types by explicitly decoding and encoding or using the :encoding layer on a filehandle, you stay in control of what happens and your program will behave more predictably.Update: thin lines used, see discussion below.
Back to
Meditations