in reply to Handling Mac, Unix, Win/DOS newlines at readtime...
I would split on /\r\n?/ instead. That avoids removing blank lines.
Update: In answer to graff's reply, /\r\n?|\n/ will work on all three platforms. I would probably just fix the original files with something based on the first regex I gave though. Better to standardize the files right off the bat. Customizing all sorts of code to deal with all three file types will get old real quick.
-sauoq
"My two cents aren't worth a dime.";
Re: Re: Handling Mac, Unix, Win/DOS newlines at readtime...
by graff (Chancellor) on Sep 16, 2002 at 04:12 UTC
|
/\r\n?/ will fail to split lines that were
created on unix systems. Eliminating blank lines might
not be so bad, but if it's an issue, then:
split(/\r\n|\r|\n/);
Just doing /[\r\n]{1,2}/ will lose some blank
lines on unix or mac input; and it's important to try to
match the longer pattern first.
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
but what if a file was created on a Windows machine, but this code was being run on a Mac?
I remember reading somewhere in this thread that \r and \n have reversed semantics on the Mac (vs. *nix, Windows).
So maybe we really want the following:
split(/ \r\n | \n\r | \r | \n /x); # (yoicks!)
My $0.02,
-- jkahn
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
but what if a file was created on a Windows machine, but this code was being run on a Mac?
It wouldn't matter which type of system was running the perl
code.
I remember reading somewhere in this thread that \r and \n have reversed semantics on the Mac (vs. *nix, Windows).
Um, no, that statement hasn't been made on this thread.
My own experience has been that MS systems use
"\r\n", all .n.x systems use "\n" and (older) Mac systems
use "\r". Nobody uses "\n\r".
And now that MacOS-X is out with a unix foundation,
maybe the number of variants will reduce to just two instead
of three.
| [reply] [Watch: Dir/Any] |
|
|
|
Yes, Macs have a backwards notion of what \r and \n are in ASCII (was this changed in OS X?) However, if the orginal poster is running the Perl script on a *nix or Windows box, it shouldn't matter.
BTW--My favorite way of dealing with the Mac's reversed notion of CR and LF is to use the octal ASCII value instead. \r = \015, \n = \012 (IIRC). You'll probably have issues with Unicode, though.
| [reply] [Watch: Dir/Any] |
|
Re: Re: Handling Mac, Unix, Win/DOS newlines at readtime...
by bart (Canon) on Sep 16, 2002 at 23:44 UTC
|
I would split on /\r\n?/ instead. That avoids removing blank lines.
But not on a Mac. On a Mac, the meaning of "\n" and "\r" got reversed. "\n" is what you use as native end-of-line characters, remember? And on a Mac, that's chr(13).
Also, as people tend to forget to upload their HTML as text, you often get sequences of two CR characters and one LF. You want to deal with that, too. So here's my solution:
/\015\015?\012|\015|\012/
which you might want to replace with "\n" using s///g, instead of splitting on it, so you get one cleaned up string, to feed into HTML::Parser or similar.
| [reply] [Watch: Dir/Any] [d/l] |
|
|