http://qs321.pair.com?node_id=11105691


in reply to Re^3: Proper Unicode handling in Perl
in thread Is there some universal Unicode+UTF8 switch?

Maybe other monks have current data, but I recall times where Windows editors like Notepad and Notepad++ saved "Unicode" files under UTF-16-BE

While I'm pretty sure that's accurate for older versions of notepad.exe (it matches my memory; though as of current Win10 1903, even that shows choices for the encoding, including UTF-8 and UTF-16 BE/LE), that's not correct for Notepad++. In modern Notepad++ (v7.7.1), and as far back as I could try* (v4.0 from Jan 2007), Notepad++ has listed separate encodings for UTF-8, UCS-2 LE, and UCS-2 BE, not calling any of them the generic "Unicode" name.

(*: I tried most major versions backwards in time, and all agreed in the results. When I tried v3.0 from 2005, it wouldn't even run on my machine, so I didn't go any farther back than that.)

  • Comment on Re^4: Proper Unicode handling in Perl (aside on Notepad++)

Replies are listed 'Best First'.
Re^5: Proper Unicode handling in Perl (aside on Notepad++)
by VK (Novice) on Sep 06, 2019 at 16:50 UTC
    At the initial standardization period there were a number of Unicode transport protocols, some of them really weird (7-bit UTF for one). As of now Notepad++ is pretty straightforward with what needed. It has in its settings:
    • ANSI
    • UTF-8 (default, plus switch "Apply to open ANSI files")
    • UTF-8 with BOM
    • UCS-2 Big Endian with BOM
    • UCS-2 Little Endian with BOM
    • other (long-long list)
    The rule of thumb is that default UTF-8 + "Apply to open ANSI files" is the only thing one ought to use.
    Anything else is only for two distinct situations: 1) one got some unreadable chunk of chars from a 3rd party file and needs to make it readable Unicode 2) one is seeking for new oops-type adventures for (her|him)self and for end users.

      Thanks to pryrt and VK for clarifying the situation with Notepad++ ! Duly noted, for further reference.