http://qs321.pair.com?node_id=800590


in reply to Re^2: parsing csv with Text::ParseWords
in thread parsing csv with Text::ParseWords

The EOL in Perl source is just "\n". On Unix that literally is all that there is. On Windows, there is a \r\n sequence. I would assume that you don't need to specify EOL for your CSV module, just let Perl do its default thing, eg leave EOL => 'x' off! And let default do its work.

I have moved files between Unix and Windows and Perl can read files created in either place. When I save a file under Unix, EOL is just \n. When saved under Windows it is \r\n. The Windows Perl can read the Unix Perl's file and vice-versa. My normal text editor, TextPad can do the same thing.

If you process a Windows file that came from a Unix system, when Perl writes it, it will put in the \r\n sequence for Perl "\n". When the Unix Perl writes a file that came from Windows, it just puts in \n instead of \r\n.

So in Perl: print "qwerty\n"; the \n may be 2 characters depending upon which OS you are running Perl under.

If you could explain this problem more with an example, that would be helpful. This is a well-known common problem.

I don't know the full history of why Windows did it this way. But in ancient mechanical paper tape days, each line ended with "carriage return(\r), line feed(\n), rubout(del). The teletype machine was dumb and need the \r to return print head to the next line and \n to advance the paper. The rubout (all 8 positions punched) was to keep mechanical fingers lubricated via the oil on the tape. The ASR 33 teletype was a "dodo bird" by even the time of DOS. Anyway this EOL problem is well known and there are solutions.

Replies are listed 'Best First'.
Re^4: parsing csv with Text::ParseWords
by GertMT (Hermit) on Oct 12, 2009 at 11:44 UTC

    thanks for the information. I've had serious problems parsing a csv-file produced by a Filemaker database on a Mac. I'll have the export script from Filemaker changed (Filemaker has various options and we maybe took the wrong option). I've almost tried everything, \l, \r, \012, \015, \rn (maybe this one not but I've made note if it now).

    Hopefully with the new export format no trouble anymore.

    thanks

      As a rule of thumb, DOS, Windows, OS/2 and many internet protocols use CR+LF, classic Macs use CR only, Unix and its derivates including modern Macs and the remaining internet protocols use LF only. They all use ASCII. Things become worse on EBCDIC systems, typically made by IBM. See Newlines in perlport.

      Look at a hex dump of the export, it will surely help more than guessing.

      od -t x1c demo.csv 0000000 22 48 65 61 64 69 6e 67 20 31 22 2c 22 48 65 61 " H e a d i n g 1 " , " H e +a 0000020 64 69 6e 67 20 32 22 2c 22 48 65 61 64 69 6e 67 d i n g 2 " , " H e a d i n +g 0000040 20 33 22 2c 22 48 65 61 64 69 6e 67 20 34 22 0d 3 " , " H e a d i n g 4 " \ +r 0000060 0a 22 44 61 74 61 20 31 22 2c 22 44 61 74 61 20 \n " D a t a 1 " , " D a t a 0000100 32 22 2c 22 44 61 74 61 20 33 22 2c 22 44 61 74 2 " , " D a t a 3 " , " D a +t 0000120 61 20 34 22 0d 0a 22 4d 6f 72 65 20 31 22 2c 22 a 4 " \r \n " M o r e 1 " , +" 0000140 4d 6f 72 65 20 32 22 2c 22 4d 6f 72 65 20 33 22 M o r e 2 " , " M o r e 3 +" 0000160 2c 22 4d 6f 72 65 20 34 22 0d 0a , " M o r e 4 " \r \n 0000173

      This particular CSV file ends each line with two bytes 0x0D and 0x0A, which is equal to \015\012 on ALL platforms. If you read the file on a Unix-based system or in binary mode on a DOS-based system, you can also use \r\n. If you read the file in text mode on a DOS-based system, use only \n. You can get the same behaviour on Unix-based system using the :crlf PerlIO layer.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)