Cross platform file I/O code

Beatnik has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
A Little History on 0D0A by jeffa (Bishop) on Mar 31, 2001 at 22:46 UTC
As Trimbach said, you should not have a problem. But why is there such a concern with the compatiliblity between record seperators across platforms in the first place? Why so much trouble? Time for a little history: Blame it on the typewriters - better yet, that lever or button on the right side of them. The one used to start typing on the next line. When you activate it, it causes the carriage to 'return' to the right so the hammers are lined up on the left side of the paper (carriage return) the carriage to roll up so the hammers are lined up on the next 'line' down (line feed) In the early 1900's teletypwriters were used to relay messages. Teletypwriters were descendants of the telegraph, and used a code similar to ASCII, called Baudot (or Murray) code. An operator could type into the teletypewriter (tty for short) and print a message on another tty far away. In this Baudot code, two special characters were designated for a Carriage Return(0x02) and a Line Feed(0x08). Baudot code went the way of the dinosaur for reasons outside the scope of this discussion, but the CR and LF characters were adopted by ASCII with different values: `CR = 0x0D = \015 = \r LF = 0x0A = \012 = \n` [download] Why were they adopted? For printers of course. The concept of the tty was split into two - a monitor and a printer. At this point, it was up to the various operating systems to implement their actual use. Oops. `Macintosh: \r Windows : \r\n Unix : \n` [download] UPDATE: Disregard that last table, kept only for historical purposes. Here is a better table, one that shows how the three different operatiing systems interpret a logical newline (ps, thanks Mr. Stein =] ) `Unix : \n = \012 Macintosh: \n = \015 Windows : \n = \012 if handled as ASCII Windows : \n = \015\012 if handled as binary` [download] This only causes problems when you transfer ASCII files around different operating systems as binary data, in which case you should use binmode - or when you are programming with sockets, in which case you will need to set $/ to '\015\012' or just use the exported globals $CRLF and CRLF() from the Socket or IO::Socket modules. So, the next time you find yourself cursing this confusion, just take a look at this typewriter history tree and remember that we are only human, except arhuman. :) Jeff R-R-R--R-R-R--R-R-R--R-R-R--R-R-R-- L-L--L-L--L-L--L-L--L-L--L-L--L-L-- UPDATE: Thanks to the anonymous monk for clarifying things up. REFERENCES: Netword Programming With Perl by Lincoln D. Stein Code by Charles Petzold	[reply] [d/l] [select]
Re: A Little History on 0D0A by indigo (Scribe) on Apr 01, 2001 at 00:00 UTC
Good post. Probably important note printers were the dominant output device through much of the 50's and 60's, and well into the 70's. The CR/LF was necessary for use with early computers, and was well entrenched before monitors became mainstream. Because printers didn't allow you to backup and erase, you wrote code with a line editor. When you wanted a change, you had to figure out what line numbers to edit, print them out, and type in your revisions. This all could get pretty tedious, so various shortcuts were devised to search for patterns in a file, to replace one string with another, etc. In time, these shortcuts grew into regular expressions, which as we all know, have proven useful long after the line editor has become a thing of the past. So, the same mechanism that gives us end of line headaches, gave rise to a cornerstone of the greatest programming language ever. I think we came out ahead on this one. :)	[reply]
Re: A Little History on 0D0A by Anonymous Monk on Apr 01, 2001 at 13:38 UTC
While your history seems somewhat accurate you seem to be mistaken about what is meant by "\n" in Perl. First of all it is not necessarily a LINE FEED character. If the folks who wrote C and Perl had wanted to escape a character for LINE FEED then why did they not use "\l" that is an escaped ell or the lower case version of L. The reason is simple: "\n" is a logical newline, yes the n is for NEWLINE. NEWLINE is not a character in the ASCII character set. It is a contrivance to get around the platform incompatabilities that operating systems impose on what are colloquially known as "TEXT" files. It is true that Macs end their text file lines with a CARRIAGE RETURN character but within C, C++, Perl and several other Mac Programming languages you need only specify that you want a NEWLINE in your text file and: `print FOO "A\n";` [download] will put the code 65 then the code 13 into your text file on a Mac, it would put the code 65 then 10 into the file on Unix and it would put the codes 65 13 10 into the file on a Microsoft operating system assuming that a `binmode(FOO);` statement had not appeared prior to the `print` statement. By the way, Unix was invented around 1969-1970 and predates the only OSes that use the CARRIAGE RETURN + LINE FEED combination for end of text line characters by about 10-12 years. Unix uses a LINE FEED for the end of lines in files called text files. The Unix OS was capable of sending text files to printers for years before DOS came along.	[reply] [d/l] [select]
Re: A Little History on 0D0A by Beatnik (Parson) on Mar 31, 2001 at 23:17 UTC
Hey, Well the problem is not having a uniform record separator, but having one that is applicable on the platform. If I'd use `\n` in my files on every platform, and only my code would access it, it wouldn't be a problem ofcourse... but assume someone will actually have to edit something manually and finds his editor acting weird cause of the `\n`'s... Greetz Beatnik ... Quidquid perl dictum sit, altum viditur.	[reply] [d/l] [select]
Re: A Little History on 0D0A by Anonymous Monk on Jun 21, 2008 at 15:31 UTC
0x0D (hex) is decimal 013 and 0x0A (hex) is decimal 010 .. or i'm wrong?	[reply]
Re^2: A Little History on 0D0A by psini (Deacon) on Jun 21, 2008 at 15:43 UTC
0x0D is hexadecimal 0D which is decimal 13 which is octal 15 (or 015). Similarly, 0x0A is hexadecimal 0A which is decimal 10 which is octal 12 (or 012). I think you have been confused by the fact that in the post you were commenting, the hex and octal forms were used, not the decimal. BTW, did you know that this thread is 7 years old? :) Careful with that hash Eugene.	[reply]
Re^3: A Little History on 0D0A by Anonymous Monk on Jan 27, 2014 at 17:20 UTC
Re^4: A Little History on 0D0A by Anonymous Monk on Jan 19, 2017 at 11:54 UTC
Re^4: A Little History on 0D0A by Anonymous Monk on May 18, 2017 at 11:15 UTC
Some notes below your chosen depth have not been shown here
Re: Cross platform file I/O code by Trimbach (Curate) on Mar 31, 2001 at 18:02 UTC
Although there is sometimes a problem reading different formatted files on different operating systems (with the non-standard ways different OS's handle \n) I've never had the slightest amount of trouble writing files. ($/, btw, is the INPUT record separator. $\ is the OUTPUT record separator. A different thing.) `print FILE "some line\n"` is AFAIK completely cross-platform... works as expected on UNIX, MacOS and Windows. I've written code on all three platforms for use on all three and I've never had to worry about it. You probably shouldn't either. :-D Gary Blackburn Trained Killer	[reply] [d/l]
Re (tilly) 1: Cross platform file I/O code by tilly (Archbishop) on Apr 01, 2001 at 00:46 UTC
This is a non-problem. While you can set $/, normally when reading or writing text files your "\n" will transparently become what it needs to be for your OS. Therefore printing "\n" to a file will (by default) do something reasonable. If you wish to change this, use binmode. But unless you are using binmode or reading the same files on multiple operating systems at once, you don't need to worry about what the record separator is. Or put another way, the following script is portable: `#! /usr/bin/perl -w use strict; print "Hello, world\n";` [download]	[reply] [d/l]


Don't ask to ask, just ask
	PerlMonks