http://qs321.pair.com?node_id=446137


in reply to Re: No Control M
in thread No Control M

I don't use "\n" because on some encodings this is not the real "\012".

Also, my regular expression solves some weird non-unix and non-mac files I've found, which have first the newline, then the carriage return.

Alberto Simões

Replies are listed 'Best First'.
Re^3: No Control M
by cazz (Pilgrim) on Apr 08, 2005 at 20:19 UTC
        I don't use "\n" because on some encodings this is not the real "\012".

    What encodings? The unicode mechanism for specifying \n is 0x000a 0 according to Unicode Standard Annex #13: Unicode Newline Guidelines. Sure, there is EBCDIC, but translating around the \r doesn't help fix newlines on EBCDIC.

       Also, my regular expression solves some weird non-unix and non-mac files I've found, which have first the newline, then the carriage return.

    I've never heard of a system that used \n\r. Do you know what generates those files?

    0: 0x0a is the same as \n in standard unix land, the unicode equiv is just null prepended.

      re ordering of CarriageReturn and LineFeed, "...some weird non-unix and non-mac files ... have first the newline, then the carriage return. IIRC:

      'doze: 0x0d, 0x0a

      pre *n*x mac: 0x0a, 0x0d

      *n*x: 0x0a

      \n, in a perlish sense, is NOT relevant; perl is compiled to use system defaults, so "\n" may be any of the above (and possibly some not mentioned), depending on where it's running

      If your default encoding is utf16, \n will have two bytes. If the file is from DOS it will never match.

      Alberto Simões

        Well, if thats true, wouldn't \r be the same way?