http://qs321.pair.com?node_id=491485


in reply to Thoughts on designing a file format.

All good points ++

I read a paper once that explained very clearly why the two character line endings (CR, LF) in DOS was a mistake, I have no idea where it was but Wikipedia echoes the sentiment. Either way documenting it is OK but using the line endings appropriate for your OS is a better approach.

Two points I would add

Hence any data file parsing I do usually ends up beginning like this-

next if /^\s*#/; next if /^\s*$/;

--
Murray Barton
Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho

Replies are listed 'Best First'.
Re^2: Thoughts on designing a file format.
by demerphq (Chancellor) on Sep 13, 2005 at 07:27 UTC

    I prefer to use network line endings because that is the standard netowrk line ending, and because quite simply there will come a day when your file needs to be read by someone whos most advanced tool for reading it will be Excel. Likewise I tend to use csv so that cut and pasting from the file to an Excel workbook works correctly, not to mention the fact that for the type of data I use embedded tabs are never a problem, but occasionally embedded commas are.

    ---
    $world=~s/war/peace/g

      I'm missing something here. On DOS if you write print FILE "some text\n"; you will get "\r\n" in the file. If you do the same thing on Unix you get just "\n". What are you outputing? Are you setting $INPUT_RECORD_SEPERATOR and $OUTPUT_RECORD_SEPERATOR to something other than default? Otherwise chomp is going to break for example, it will remove "\r\n" on DOS and just "\n" on Unix leaving a "\r" at the end of every line. It seems like a lot of trouble to deal with something that ftp clients do automatically... if I copy your program and data file over to Unix I have to then change the line endings back to CR/LF before it works???

      --
      Murray Barton
      Do not seek to follow in the footsteps of the wise. Seek what they sought. -Basho

Re: Thoughts on designing a file format.
by jonadab (Parson) on Sep 13, 2005 at 17:30 UTC
    I read a paper once that explained very clearly why the two character line endings (CR, LF) in DOS was a mistake

    Now, let me explain why two-character line endings in DOS was *not* a mistake...