Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Perl, DOS and encodings

by haj (Vicar)
on Apr 29, 2020 at 19:37 UTC ( [id://11116237]=note: print w/replies, xml ) Need Help??


in reply to Perl, DOS and encodings

This is indeed related to weirdness of the cmd.exe command box. You can control the output encoding of Perl programs with the chcp command, but this does not affect the encoding of Perl's @ARGV.

I found this superuser.com anwser helpful to find out the default encoding for my machine, and it is this encoding which is applied to parameters which you pass to Perl programs, regardless of your chcp settings. So, most probably, your Windows system is using the cyrillic default encoding of codepage 1251 for input - but defaults to codepage 866 for output.

Cygwin is another story, of course. Contemporary Unix/Linux terminals are using UTF-8 as default encoding, and this is applied when you pass data from bash (the Cygwin shell) to your Perl program.

Replies are listed 'Best First'.
Re^2: Perl, DOS and encodings
by siberia-man (Friar) on Apr 29, 2020 at 20:02 UTC
    Thank you for your response. I've just tested the suggestion from the superuser.com answer. To be honest, without your explanation that answer doesn't give much clues. Simply compare it.

    As I have already said, the code page defaults to 866 (or IBM CP866, the old code page since MSDOS 4.01). BodyName = koi8-r is another code page 20866. How does it indeed work -- I don't know, cmd.exe is definitely painful.
    C:\>chcp Active code page: 866 C:\>powershell -c "[System.Text.Encoding]::Default" IsSingleByte : True BodyName : koi8-r EncodingName : Cyrillic (Windows) HeaderName : windows-1251 WebName : windows-1251 WindowsCodePage : 1251 IsBrowserDisplay : True IsBrowserSave : True IsMailNewsDisplay : True IsMailNewsSave : True EncoderFallback : System.Text.InternalEncoderBestFitFallback DecoderFallback : System.Text.InternalDecoderBestFitFallback IsReadOnly : True CodePage : 1251
    I tested the command from my opening post with different codes pages, setting it to 1251 or 65001 (utf-8). The only correct encoding for Cyrillic text in CLI is 1251. The default encoding in Cygwin is en_US.UTF-8.

    Updated:

    I tested the script invoking it from the shell/batch script. It works correctly, if the title's encoding corresponds the encoding of the shell script. The code page 1251 only has to be specified in the batch script, independently of the encoding of the batch script itself.

      The relevant information is the WindowsCodePage entry. This is the encoding which is used by cmd.exe to pass cyrillic characters from your terminal input to your Perl program, and you can not change it using chcp.

      According to my experiments, which may be totally bogus, things get even more interesting if you write your command, including command line parameters with cyrillic characters, into a .bat file and execute that. In that case, the chcp setting will be used to decode the batch file - but still the Perl program will receive its @ARGV in the WindowsCodePage encoding.

      So, if your batch file is UTF-8 encoded, you need to chcp 65001 and use --title-transcode=cp1251 if you pass the title as a command line parameter.

        Oh, yes. I tested it, but my experiments were not completed. At least in pure DOS I have to specify cp1251 as encoding for parameters in the command line. It doesn't depend on the file encoding. If I run the same script under ConEmu, I have to specify the same encoding as the file itself.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11116237]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-04-25 12:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found