Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Parsing a text file without newlines

by bart (Canon)
on Dec 14, 2004 at 13:18 UTC ( [id://414697]=note: print w/replies, xml ) Need Help??


in reply to Parsing a text file without newlines

My guess is that your local file has, for example, only CR characters as end of line markers, and you upload (FTP) in text mode... Whoops, all the CR characters are gone. Are you by any chance using a Mac on either side of the connection?

Anyway, if that's the cause, think about uploading in binary mode, which may require an extra conversion on one side of the transmission, and for which you may even use a perl script to fix it. Either that, or you make your parsing script more flexible, so it accepts any conventional line endings (CR only, LF only, CR+LF).

Your file, as you have shown here, is kaput. Upload it again, more carefully this time.

  • Comment on Re: Parsing a text file without newlines

Replies are listed 'Best First'.
Re^2: Parsing a text file without newlines
by Fletch (Bishop) on Dec 14, 2004 at 13:40 UTC

    You've got that reversed. FTP's text mode (usually the default or enabled with a command ascii) converts line endings to platform native; binary preserves the contents of the file verbatim.

      The thing is, uploading a file in text mode from say Windows to Linux, will simply strip all CR characters, whether there are LF character present, or not. All you have left, is one, long line.

        That seems like broken behavior on either the client or ftpd's part then. Quoting from the FTP RFC:

        3.1.1.1. ASCII TYPE This is the default type and must be accepted by all FTP implementations. It is intended primarily for the transfe +r of text files, except when both hosts would find the EBCDI +C type more convenient. The sender converts the data from an internal character representation to the standard 8-bit NVT-ASCII representation (see the Telnet specification). The receiv +er will convert the data from the standard form to his own internal form. In accordance with the NVT standard, the <CRLF> sequence should be used where necessary to denote the end of a line of text. (See the discussion of file structure at the end of the Section on Data Representation and Storage.) Using the standard NVT-ASCII representation means that dat +a must be interpreted as 8-bit bytes. The Format parameter for ASCII and EBCDIC types is discuss +ed below.

        As that reads, the protocol says in ascii mode you convert to the network line ending (CRLF) when sending and from CRLF to native on receipt. If your client or server's just stripping CR blindly it's not living up to the spec.

        I did just now test this with the stock NT 2000 ftp.exe and vsftpd that I had installed on RH9 and it did just strip CRs, so the original post was correct about how things work (I still say it's broken though :).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://414697]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-18 09:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found