http://qs321.pair.com?node_id=892086


in reply to Re^3: Printing the first letter of the Hebrew alphabet (U05D0) kills script?
in thread Printing the first letter of the Hebrew alphabet (U05D0) kills script?

You mentioned something about an "Xemacs shell". Is that a variable that can be eliminated?

In this case the xemacs shell was being used as a control rather than something to be eliminated. One way of assuring myself that this was xterm specific behavior was to run the script in an alternative shell and see what happened. As it turned out, there was no disappearing output in the xemacs shell (which is really just a file buffer pretending to be a shell). I also dumped the output to a file instead of the terminal (as suggested above by BrowserUK and not surprisingly it was all there - no disappearing output. This really does seem to be an xterm problem.

those normally start with ESCape (^[).

Like you, that was my first assumption too, but googling around I see that there does appear to be some overlap between UTF-8 and xterm escape sequences. For example,

Under normal mouse mode, positions outside (160,94) result in byte pairs which can be interpreted as a single UTF-8 character; applications which do treat their input as UTF-8 will almost certainly be confused unless extended mouse mode is active. Source: http://invisible-island.net/xterm/ctlseqs/ctlseqs.html#Mouse%20Tracking

I'm not sure how that explains what I'm seeing, but that may not be the only case of overlap.

I don't know much about terminals, and less about xterm. I didn't even have xterm installed until this came up.

Wow. Many thanks for the effort you have put into this!

  • Comment on Re^4: Printing the first letter of the Hebrew alphabet (U05D0) kills script?

Replies are listed 'Best First'.
Re^5: Printing the first letter of the Hebrew alphabet (U05D0) kills script?
by ikegami (Patriarch) on Mar 08, 2011 at 22:00 UTC

    Under normal mouse mode, positions outside (160,94) result in byte pairs which can be interpreted as a single UTF-8 character;

    For there to be an issue, a sequence of UTF-8 characters has be interpreted as an escape sequence, not the other way around.

    From higher up in that linked document comes this:

    The xterm program recognizes both 8-bit and 7-bit control characters. It generates 7-bit controls (by default) or 8-bit if S8C1T is enabled.

    It proceeds to say 0x9B and ESC [ are equivalent, for example.

    More relevant, it says 0x90 and ESC P are equivalent. U+05D0 is 0xD7 0x90 in UTF-8.

    Are these equivalent for you?

    perl -e'print "\x1B[31m", "foo", "\x1B[0m", "bar", "\n";' perl -e'print "\x9B31m", "foo", "\x9B0m", "bar", "\n";'

    Perhaps you can tell xterm to stop recognising the "8-bit" codes.

      From "man xterm":

      Modes for setting keyboard style:

      8-Bit Controls (8-bit-control)

      Enabled for VT220 emulation, this controls whether xterm will send 8-bit control sequences rather than using 7-bit (ASCII) controls, e.g., sending a byte in the range 128-159 rather than the escape character followed by a second byte. Xterm always interprets both 8-bit and 7-bit control sequences (see the document Xterm Control Sequences). This corresponds to the eightBitControl resource.

      (Xterm Control Sequences is the document to which you linked earlier.)

      But for some reason, the 8-bit control sequence I posted above isn't recognised by my xterm.

      Ideally, it would recognise the sequences only between characters, but maybe it's detecting the sequences in the middle of characters too.

        While a raw 0x9B doesn't work on my (non-xterm) console, the UTF-8 encoding of 0x9B works!

        # Doesn't work perl -we'print "\x9B31m", "foo", "\x9B0m", "bar", "\n";' # Works perl -CS -we'print "\x9B31m", "foo", "\x9B0m", "bar", "\n";'

        Neither work on in an xterm for me.