http://qs321.pair.com?node_id=11136330


in reply to Re^13: Reverse download protocols
in thread Reverse download protocols [solved]

There are delta transfer algorithms (see rsync and librsync) where the client updates a local file according to the difference sent by server. They use rolling checkums to only transfer the differing bits. There's BitTorrent where the client knows the checksum of each fragment of the file and only asks the server for the chunks that it doesn't have yet. Both of these methods require some kind of prior knowledge on the side of the client.

There are domain-specific data compression methods that may outperform generic lossless compression methods. For example, if you want to store an m by n matrix and know for sure that is has a low rank p, you can use truncated SVD and only store two much smaller m by p and p by n matrices, then compress the residuals (supposedly small normally distributed noise) using arithmetic coding. You can also train an autoencoder neural network and hope it's sufficiently smart to figure out whatever structure your data has. These methods require the data to follow some kind of model in order to compress the data well.

Either way, you cannot compress the data below its information content. If you're transmitting pictures of a street light and all you care about is whether it's on, you can transmit one bit instead of the whole picture. But if these are going to be pictures of different street lights and every aspect of them is going to be important, including some you can't predict in advance, this adds up to many bits which you can't avoid transferring. In this case, you probably won't outperform generic image compression methods.

Perhaps if you're more specific (what's your actual use case?), we may be able to offer better advice than that.