http://qs321.pair.com?node_id=296627


in reply to Re: Re: Re: FTP and checksum
in thread FTP and checksum

The last part is indeed part of my argument. Not only are there checksums at multiple levels in most networks (Ethernet, TCP, perhaps others, depending upon the transmission method), but most modern networks have extremely low error rates, particularly if the admin has taken any care at all on the important links.

Secondly, the TCP checksum is NOT calculated on a hop-by-hop basis. The TCP data should NEVER be modified on it's way across the network (modern QoS implementations notwithstanding). It is the layer 2 headers and checksums that are stripped and rebuilt at every hop. IP (layer 3), TCP (layer 4), and everything above that should not be changed from end-to-end.

Thirdly, the layer 2 checksums are going to be checked at every switch across the network. Layer 3 checksums are going to be checked at every router across the network. Both of these AND the layer 4 TCP checksums are going to be verified at the receiving FTP server. While you argue that the possibility of a multibit error that would create the same checksum is non-zero, I would argue that the possiblity of a multibit error that would allow ALL THREE checksums to remain the same AND still cause the data to be accepted at the far end (e.g. the TCP sequence numbers, IP addresses, MAC addresses, etc. weren't part of what changed) is either zero, or so remote that it isn't worth discussing. At that level, the user is more likely to get struck by lightning, thus removing his or her concern about file integrity.

In your example of the financial institution, I would expect the net admin to pay attention to the output of /sbin/ifconfig ethX and the statistics on the switches, looking for bad packets. I would also expect care to be taken that common things like duplex mismatches are not allowed to occur.

Even further, other methods of checksumming, such as MD5 and even the use of 'diff' are not proof against the kind of multibit errors you're talking about. Even 'diff' does not (by default) do a byte-by-byte comparison between two files.

Lastly, in the case of a financial institution, I would really expect them to be using SCP instead of FTP in the first place.

If the network is considered unreliable, fixing the network is preferable to adding extra layers of data checking. If there is no way to consider the network reliable - perhaps for political or philosophical reasons - just transfer the file twice and do a byte-by-byte diff. If they don't match exactly, throw a warning and don't attempt another transfer until the issue is resolved.

--Rhys