http://qs321.pair.com?node_id=446152


in reply to Re^2: No Control M
in thread No Control M

    I don't use "\n" because on some encodings this is not the real "\012".

What encodings? The unicode mechanism for specifying \n is 0x000a 0 according to Unicode Standard Annex #13: Unicode Newline Guidelines. Sure, there is EBCDIC, but translating around the \r doesn't help fix newlines on EBCDIC.

   Also, my regular expression solves some weird non-unix and non-mac files I've found, which have first the newline, then the carriage return.

I've never heard of a system that used \n\r. Do you know what generates those files?

0: 0x0a is the same as \n in standard unix land, the unicode equiv is just null prepended.

Replies are listed 'Best First'.
Re^4: No Control M
by ww (Archbishop) on Apr 08, 2005 at 21:08 UTC
    re ordering of CarriageReturn and LineFeed, "...some weird non-unix and non-mac files ... have first the newline, then the carriage return. IIRC:

    'doze: 0x0d, 0x0a

    pre *n*x mac: 0x0a, 0x0d

    *n*x: 0x0a

    \n, in a perlish sense, is NOT relevant; perl is compiled to use system defaults, so "\n" may be any of the above (and possibly some not mentioned), depending on where it's running

Re^4: No Control M
by ambs (Pilgrim) on Apr 08, 2005 at 21:03 UTC
    If your default encoding is utf16, \n will have two bytes. If the file is from DOS it will never match.

    Alberto Simões

      Well, if thats true, wouldn't \r be the same way?
        Good question. In my experience \r is not the same. My idea is that Perl does not change \r accordingly with the environment.

        Alberto Simões