Unix and Windows CRLF vs LF

SavannahLion has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Unix and Windows CRLF vs LF by cdarke (Prior) on May 13, 2009 at 11:28 UTC
Presumably you have no way of knowing if the input records are terminated with \r\n or just \n without reading the records. Therefore I wouldn't alter $/ but instead: `while (<FILE>) { s/\r?\n$//; # If there is no \r, so what? # Random code }` [download]	[reply] [d/l]
Re: Unix and Windows CRLF vs LF by Burak (Chaplain) on May 13, 2009 at 06:02 UTC
Using foreach on the FH is not efficient. Use while instead. And perl will automatically convert the line ending AFAIK. Try to use chomp: `while ( my $line = <FILE> ) { chomp $line; # remove new line #Random code }` [download]	[reply] [d/l]
Re^2: Unix and Windows CRLF vs LF by targetsmart (Curate) on May 13, 2009 at 07:11 UTC
to add more value to Burak's post chomp $line; # remove new line from chomp This safer version of "chop" removes any trailing string that corresponds to the current value of $/ (also known as $INPUT_RECORD_SEPARATOR in the English module). Vivek -- In accordance with the prarabdha of each, the One whose function it is to ordain makes each to act. What will not happen will never happen, whatever effort one may put forth. And what will happen will not fail to happen, however much one may seek to prevent it. This is certain. The part of wisdom therefore is to stay quiet.	[reply]
Re^2: Unix and Windows CRLF vs LF by SavannahLion (Pilgrim) on May 13, 2009 at 06:19 UTC
Ooops!! I meant to write While, rather than for each. I don't think for each in my code even works properly as it is written. I will update and make note. On a side note, why is For Each less efficient than While? Aren't they the same in this context? ---- Thanks for your patience.	[reply]
Re^3: Unix and Windows CRLF vs LF by nikosv (Deacon) on May 13, 2009 at 07:06 UTC
foreach will make a list out of the file's contents and then will iterate, so it will read the whole file into memory. while has a small footprint since it goes through the contents line by line plus it terminates when the the condition will evaluate to false;with foreach you go through everything in the list	[reply]
Re: Unix and Windows CRLF vs LF by planetscape (Chancellor) on May 13, 2009 at 15:56 UTC
Use dos2unix, flip or flip: Newline conversion between Unix, Macintosh and MS-DOS ASCII files to convert. Or see Super Search: newline regex. HTH, planetscape	[reply]
Re: Unix and Windows CRLF vs LF by ikegami (Patriarch) on May 13, 2009 at 16:17 UTC
On a Windows system (without binmode), `while (<$fh>) { # If the file contained lines ending with CRLF, $_ ends with LF # If the file contained lines ending with LF, $_ ends with LF chomp; # Removes LF }` [download] On other systems, `while (<$fh>) { # If the file contained lines ending with CRLF, $_ ends with CRLF # If the file contained lines ending with LF, $_ ends with LF s/\r?\n\z//; # Removes CRLF or LF }` [download] And since the latter works on Windows as well, you can just use it everywhere. The only systems where this doesn't work are old Macs.	[reply] [d/l] [select]
Re: Unix and Windows CRLF vs LF by bobf (Monsignor) on May 15, 2009 at 03:25 UTC
I asked a similar question in Newlines: reading files that were created on other platforms. In summary, I started with these options: Use $^O, but if I understand it correctly that will just tell me about the system the program is running on, which (as exemplified here) is not necessarily the same as the system that created the file. Use a regex to match the newline character(s) in the file. I think this would require slurping the whole file and then doing something like `if( $file =~ m/\015$/ )` (which assumes the file will end with a newline) or `if( $file =~ m/\015(?!\012)/ )` (which doesn't), setting $/ according to what matched, and re-reading the file line-by-line. Preprocess the input file to convert all newline characters to the current system's newline character. I experimented a little, and I think this will work: `$file =~ s[(\015)?\012(?!\015)][\n]g; $file =~ s[(\012)?\015(?!\012)][\n]g;` [download] I ended up implementing the preprocessing solution, but I would probably use binmode if I were to do it today.	[reply] [d/l] [select]
Re: Unix and Windows CRLF vs LF by SavannahLion (Pilgrim) on Jun 05, 2009 at 06:40 UTC
Thank you for your help. Ultimately, I found a hint at http://perldoc.perl.org/perlfaq6.html describing a method on file streaming. The specific code example I ended up studying (took me about three hours of reading to figure out what (?s) did. Another hour to figure out why $_ needed to be localized.) is: `local $_ = ""; while( sysread FH, $_, 8192, length ) { while( s/^((?s).*?)your_pattern/ ) { my $record = $1; # do stuff here. } }` [download] Yes, I know the bug in this example. It's exactly how it is on the perldocs. Took me about an half hour of debugging to figure it out. Just about drove me nuts. In any case, many thanks for the pointers, tips and hints. I needed some starting points and that is exactly what I got.	[reply] [d/l]
Re^2: Unix and Windows CRLF vs LF by ikegami (Patriarch) on Jun 05, 2009 at 08:46 UTC
Another hour to figure out why $_ needed to be localized Because you change its value. Clobbering your caller's variables isn't nice. Yes, I know the bug in this example. It's exactly how it is on the perldocs perlbug	[reply]


laziness, impatience, and hubris
	PerlMonks