Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

End of line issues when using Mac OS X 10.5 and Windows (for Mac) products

by Ninth Prince (Acolyte)
on Sep 25, 2008 at 19:16 UTC ( [id://713707]=perlquestion: print w/replies, xml ) Need Help??

Ninth Prince has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I'm on a MacBook Pro running Mac OS X 10.5. I created a file using WORD for Mac and saved it as as a text file (.txt). The file is just five lines of two-digit numbers. I then try to read the file into an array. What I get is an array with 1 element. If I print the 1 element it looks like my file - 5 lines of two-digit numbers.

When I create the data file using TextEdit (which comes installed on the Mac) and then read the file I get an array with 5 elements.

Can someone help me understand what's going on here?

  • Comment on End of line issues when using Mac OS X 10.5 and Windows (for Mac) products

Replies are listed 'Best First'.
Re: End of line issues when using Mac OS X 10.5 and Windows (for Mac) products
by ikegami (Patriarch) on Sep 25, 2008 at 20:18 UTC

    Macs formerly used CR (\x0D) as the line separator. Perl for the older Macs responded by redefining \n as CR (\x0D) and leaving the input record separator ($/) as \n.

    Newer Macs use LF (\x0A) as the line separator like other unix platforms. Perl builds for modern Macs should leave \n as LF (\x0A).

    I believe your Word is using the old standard (using CR as the input record separator) and your Perl is expecting LF. TextEdit silently(?) converts the line endings. You can do so programmatically using the following tool:

    #!/usr/bin/perl # # Converts LF, CR and CRLF line endings to the local line ending. # # Usage: # fix_line_endings infile > outfile # fix_line_endings < infile > outfile # perl -i.bak fix_line_endings file (In place) # use strict; use warnings; my $file = do { local $/; <> }; $file =~ s/\x0A|\x0D\x0A?/\n/g; print $file;

    Update: Simplified code.

      \n is never redefined, io layer translates
        Sorry, that's not true. It's the case for Windows, but it was NOT the case for MacPerl. The string literals "\x0A" and "\n" produced different strings (chr(10) and chr(13)) respectively). That's why it was best practice to use "\x0D\x0A" for internet protocol modules instead of "\r\n". The latter wasn't portable.

        Update: Elaborated.

Re: End of line issues when using Mac OS X 10.5 and Windows (for Mac) products
by broomduster (Priest) on Sep 25, 2008 at 23:57 UTC
    MSWord and other Office tools use the "old" Mac line ending (CR aka \r aka 0x0d) when producing plain text files. Other text-producing tools (including, but not limited to, TextEdit, Emacs, Nano) use the standard Unix ending (LF aka NL aka \n aka 0x0a):
    -> od -c word.txt 0000000 1 \r 2 \r 3 \r 4 \r 5 \r 0000012 -> od -c textedit.txt 0000000 1 \n 2 \n 3 \n 4 \n 5 \n 0000012 -> od -c emacs.txt 0000000 1 \n 2 \n 3 \n 4 \n 5 \n 0000012 -> od -c nano.txt 0000000 1 \n 2 \n 3 \n 4 \n 5 \n 0000012
    All of the above files (created under Mac OS X) are one digit per line (as viewed in the application in question).

    Use ikegami's code from above reply to convert existing old-style files and use one of the non-Office text editors for editing plain text (choose the one you are most comfortable with; other options available on Mac OS X include vi or vim).

Re: End of line issues when using Mac OS X 10.5 and Windows (for Mac) products
by philcrow (Priest) on Sep 25, 2008 at 19:34 UTC
    There must be some funny characters hidden in their by WORD. You could use a command line tool like od to try and see them.

    Phil

    My Perlish Patterns book is now available. The Gantry/Bigtop book is still at that lulu store.
Re: End of line issues when using Mac OS X 10.5 and Windows (for Mac) products
by jethro (Monsignor) on Sep 25, 2008 at 20:06 UTC

    Word for Mac seems to use a line ending character not recognized by perl. Unix/Linux and probably MacOS uses \n (LF, ascii 10) for line endings, Windows uses \r\n (CR/LF, ascii 13,10). Your Word for Mac seems to use something without any \n at all, maybe a relict from old MacOS9 ??

    Anyway, if you really want to use Word for Mac and it doesn't have a switch to change its behaviour, you could change perls notion of what a line ending is by changing $/ before reading this file.

    Just to be sure, you really did save as a text file and didn't only change the filename from bla.doc to bla.txt ?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://713707]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-04-25 14:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found