Re^2: No Control M

http://qs321.pair.com?node_id=446137

in reply to Re: No Control M
in thread No Control M

I don't use "\n" because on some encodings this is not the real "\012".

Also, my regular expression solves some weird non-unix and non-mac files I've found, which have first the newline, then the carriage return.

Alberto Simões

Comment on Re^2: No Control M

Replies are listed 'Best First'.
Re^3: No Control M by cazz (Pilgrim) on Apr 08, 2005 at 20:19 UTC
I don't use "\n" because on some encodings this is not the real "\012". What encodings? The unicode mechanism for specifying \n is 0x000a ⁰ according to Unicode Standard Annex #13: Unicode Newline Guidelines. Sure, there is EBCDIC, but translating around the \r doesn't help fix newlines on EBCDIC. Also, my regular expression solves some weird non-unix and non-mac files I've found, which have first the newline, then the carriage return. I've never heard of a system that used \n\r. Do you know what generates those files? 0: 0x0a is the same as \n in standard unix land, the unicode equiv is just null prepended.	[reply]
Re^4: No Control M by ww (Archbishop) on Apr 08, 2005 at 21:08 UTC
re ordering of CarriageReturn and LineFeed, "...some weird non-unix and non-mac files ... have first the newline, then the carriage return. IIRC: 'doze: 0x0d, 0x0a pre nx mac: 0x0a, 0x0d nx: 0x0a \n, in a perlish sense, is NOT relevant; perl is compiled to use system defaults, so "\n" may be any of the above (and possibly some not mentioned), depending on where it's running	[reply]
Re^4: No Control M by ambs (Pilgrim) on Apr 08, 2005 at 21:03 UTC
If your default encoding is utf16, \n will have two bytes. If the file is from DOS it will never match. Alberto Simões	[reply]
Re^5: No Control M by cazz (Pilgrim) on Apr 08, 2005 at 21:10 UTC
Well, if thats true, wouldn't \r be the same way?	[reply]
Re^6: No Control M by ambs (Pilgrim) on Apr 08, 2005 at 21:13 UTC

In Section Cool Uses for Perl