Beatnik has asked for the wisdom of the Perl Monks concerning the following question:
Hey,
I considered tweaking some of my code to be more or less platform independent, but it struck me....
Windows doesn't use yar average record separator, neither does Mac (hence the $/). Now instead of using a plain print FILE "some line\n"; shouldn't we be using print FILE "some line$/"; instead?? (considering we don't use $/ in one of our quick 'n' dirty tricks)
Greetz
Beatnik
... Quidquid perl dictum sit, altum viditur.
A Little History on 0D0A
by jeffa (Bishop) on Mar 31, 2001 at 22:46 UTC
|
As Trimbach said, you should not have a problem. But why
is there such a concern with the compatiliblity between
record seperators across platforms in the first place?
Why so much trouble?
Time for a little history:
Blame it on the typewriters - better yet, that lever or
button on the right side of them. The one used to start
typing on the next line. When you activate it, it causes
- the carriage to 'return' to the right so the hammers are
lined up on the left side of the paper (carriage return)
- the carriage to roll up so the hammers are lined up on
the next 'line' down (line feed)
In the early 1900's teletypwriters were used to
relay messages. Teletypwriters were descendants of the
telegraph, and used a code similar to ASCII, called
Baudot (or Murray) code. An operator could type into the
teletypewriter (tty for short) and print a message on
another tty far away.
In this Baudot code, two special characters were designated
for a Carriage Return(0x02) and a Line Feed(0x08).
Baudot code went the way of the dinosaur for reasons outside
the scope of this discussion, but the CR and LF characters
were adopted by ASCII with different values:
CR = 0x0D = \015 = \r
LF = 0x0A = \012 = \n
Why were they adopted? For printers of course. The concept
of the tty was split into two - a monitor and a printer.
At this point, it was up to the various operating systems
to implement their actual use. Oops.
Macintosh: \r
Windows : \r\n
Unix : \n
UPDATE: Disregard that last table, kept only for historical
purposes. Here is a better table, one that shows how the
three different operatiing systems interpret a logical
newline (ps, thanks Mr. Stein =] )
Unix : \n = \012
Macintosh: \n = \015
Windows : \n = \012 if handled as ASCII
Windows : \n = \015\012 if handled as binary
This only causes problems when you transfer ASCII files
around different operating systems as binary data, in
which case you should use binmode - or when you are programming with
sockets, in which case you will need to set $/ to '\015\012'
or just use the exported globals $CRLF and CRLF() from the
Socket or IO::Socket
modules.
So, the next time you find yourself cursing this confusion,
just take a look at this typewriter history
tree
and remember that we are only human, except arhuman. :)
Jeff
R-R-R--R-R-R--R-R-R--R-R-R--R-R-R--
L-L--L-L--L-L--L-L--L-L--L-L--L-L--
UPDATE: Thanks to the anonymous monk for clarifying things up.
REFERENCES:
| [reply] [d/l] [select] |
|
Good post.
Probably important note printers were the dominant output device through much of the 50's and 60's, and well into the 70's. The CR/LF was necessary for use with early computers, and was well entrenched before monitors became mainstream.
Because printers didn't allow you to backup and erase, you wrote code with a line editor. When you wanted a change, you had to figure out what line numbers to edit, print them out, and type in your revisions. This all could get pretty tedious, so various shortcuts were devised to search for patterns in a file, to replace one string with another, etc. In time, these shortcuts grew into regular expressions, which as we all know, have proven useful long after the line editor has become a thing of the past.
So, the same mechanism that gives us end of line headaches, gave rise to a cornerstone of the greatest programming language ever. I think we came out ahead on this one. :)
| [reply] |
|
While your history seems somewhat accurate you seem to be mistaken about what is meant by "\n" in Perl. First of all it is not necessarily a LINE FEED character. If the folks who wrote C and Perl had wanted to escape a character for LINE FEED then why did they not use "\l" that is an escaped ell or the lower case version of L. The reason is simple: "\n" is a logical newline, yes the n is for NEWLINE. NEWLINE is not a character in the ASCII character set. It is a contrivance to get around the platform incompatabilities that operating systems impose on what are colloquially known as "TEXT" files. It is true that Macs end their text file lines with a CARRIAGE RETURN character but within C, C++, Perl and several other Mac Programming languages you need only specify that you want a NEWLINE in your text file and:
print FOO "A\n";
will put the code 65 then the code 13 into your text file on a Mac, it would put the code 65 then 10 into the file on Unix and it would put the codes 65 13 10 into the file on a Microsoft operating system assuming that a
binmode(FOO); statement had not appeared prior to the print statement.
By the way, Unix was invented around 1969-1970 and predates the only OSes that use the CARRIAGE RETURN + LINE FEED combination for end of text line characters by about 10-12 years. Unix uses a LINE FEED for the end of lines in files called text files. The Unix OS was capable of sending text files to printers for years before DOS came along.
| [reply] [d/l] [select] |
|
Hey,
Well the problem is not having a uniform record separator,
but having one that is applicable on the platform.
If I'd use \n in my files on every platform,
and only my code would access it, it wouldn't be a problem ofcourse...
but assume someone will actually have to edit something manually and finds his editor acting weird cause of the \n's...
Greetz
Beatnik
... Quidquid perl dictum sit, altum viditur.
| [reply] [d/l] [select] |
|
0x0D (hex) is decimal 013 and
0x0A (hex) is decimal 010 .. or i'm wrong?
| [reply] |
|
0x0D is hexadecimal 0D which is decimal 13 which is octal 15 (or 015).
Similarly, 0x0A is hexadecimal 0A which is decimal 10 which is octal 12 (or 012).
I think you have been confused by the fact that in the post you were commenting, the hex and octal forms were used, not the decimal.
BTW, did you know that this thread is 7 years old? :)
Careful with that hash Eugene.
| [reply] |
|
|
|
|
Re: Cross platform file I/O code
by Trimbach (Curate) on Mar 31, 2001 at 18:02 UTC
|
Although there is sometimes a problem reading different formatted files on different operating systems (with the non-standard ways different OS's handle \n) I've never had the slightest amount of trouble writing files. ($/, btw, is the INPUT record separator. $\ is the OUTPUT record separator. A different thing.)
print FILE "some line\n" is AFAIK completely cross-platform... works as expected on UNIX, MacOS and Windows. I've written code on all three platforms for use on all three and I've never had to worry about it. You probably shouldn't either. :-D
Gary Blackburn
Trained Killer | [reply] [d/l] |
Re (tilly) 1: Cross platform file I/O code
by tilly (Archbishop) on Apr 01, 2001 at 00:46 UTC
|
This is a non-problem. While you can set $/, normally when
reading or writing text files your "\n" will transparently
become what it needs to be for your OS. Therefore printing
"\n" to a file will (by default) do something reasonable.
If you wish to change this, use binmode. But unless
you are using binmode or reading the same files on
multiple operating systems at once, you don't need to worry
about what the record separator is.
Or put another way, the following script is portable:
#! /usr/bin/perl -w
use strict;
print "Hello, world\n";
| [reply] [d/l] |
|
|