http://qs321.pair.com?node_id=446134


in reply to No Control M

  1. I prefer using dos2unix & mac2unix, as they are quite a bit faster than firing up perl, but I can see the usefulness of doing it all at once. (Though, a simple shellscript wrapper that calls dos2unix & mac2unix would probably still be faster)
  2. Also, you use octal as well as the escape sequences. Why not pick one method of representing characters and stick with it? \n is easier to read for most of us than \012.
  3. Your code can be faster by only mucking with files that have \r. If a file already has unix line endings, you are still modifying the data in place. Try this instead:
    s/\r\n?/\n/g;

Replies are listed 'Best First'.
Re^2: No Control M
by ambs (Pilgrim) on Apr 08, 2005 at 19:56 UTC
    I don't use "\n" because on some encodings this is not the real "\012".

    Also, my regular expression solves some weird non-unix and non-mac files I've found, which have first the newline, then the carriage return.

    Alberto Simões

          I don't use "\n" because on some encodings this is not the real "\012".

      What encodings? The unicode mechanism for specifying \n is 0x000a 0 according to Unicode Standard Annex #13: Unicode Newline Guidelines. Sure, there is EBCDIC, but translating around the \r doesn't help fix newlines on EBCDIC.

         Also, my regular expression solves some weird non-unix and non-mac files I've found, which have first the newline, then the carriage return.

      I've never heard of a system that used \n\r. Do you know what generates those files?

      0: 0x0a is the same as \n in standard unix land, the unicode equiv is just null prepended.

        re ordering of CarriageReturn and LineFeed, "...some weird non-unix and non-mac files ... have first the newline, then the carriage return. IIRC:

        'doze: 0x0d, 0x0a

        pre *n*x mac: 0x0a, 0x0d

        *n*x: 0x0a

        \n, in a perlish sense, is NOT relevant; perl is compiled to use system defaults, so "\n" may be any of the above (and possibly some not mentioned), depending on where it's running

        If your default encoding is utf16, \n will have two bytes. If the file is from DOS it will never match.

        Alberto Simões