Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^3: Best technique to code/decode binary data for inter-machine communication?

by BrowserUk (Patriarch)
on Aug 16, 2012 at 00:38 UTC ( [id://987666]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Best technique to code/decode binary data for inter-machine communication?
in thread Best technique to code/decode binary data for inter-machine communication?

As SuicideJunkie suggest, you were probably trying to use line-oriented xfer functions (ie. print and readline ) on a binmoded socket.

My recommendation would be to use pack/unpack & send/recv like this:

$to->send( pack 'n/a*', $binData ); ... $from->recv( my $len, 2 ); $from->recv( my $binData, unpack 'n', $len );

That's good for packets up to 64k in length. Switch to 'N' to handle up to 4GB.

The nice thing about this is that the receiver always knows how much to ask for; and can verify that he got it (length $binData) which avoids the need for delimiters and works just as well with non-blocking sockets if you need to go that way.

Important update: If using this method to transmit data between machines, see also the thread at Mystery! Logical explanation or just Satan's work?

I also found that when it comes to transmitting arrays and hashes, using pack/unpack is usually more compact (and therefore faster) than using Storable, because (for example) an integer always required 4 or 8 bytes binary, but for many values it is shorter in ascii:

use Storable qw[ freeze ];; @a = 1..100;; $packed = pack 'n/(n/a*)', @a;; print length $packed;; 394 $ice = freeze \@a;; print length $ice;; 412 @b = unpack 'n/(n/a*)', $packed;; print "@b";; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 2 +7 28 29 30 31 32 33 34 35 ... %h = 'aaaa'..'aaaz';; $packed = pack 'n/(n/a*)', %h;; print length $packed;; 158 $ice = freeze \%h;; print length $ice;; 202 %h2 = unpack 'n/(n/a*)', $packed;; pp \%h2;; { aaaa => "aaab", aaac => "aaad", aaae => "aaaf", aaag => "aaah", aaai => "aaaj", aaak => "aaal", aaam => "aaan", aaao => "aaap", aaaq => "aaar", aaas => "aaat", aaau => "aaav", aaaw => "aaax", aaay => "aaaz", }

It doesn't always work out smaller, but it is usually faster and platform independent.

Of course, storable wins if your data structures can contain references to others.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^4: Best technique to code/decode binary data for inter-machine communication?
by sundialsvc4 (Abbot) on Aug 16, 2012 at 01:55 UTC

    I have had some interesting experiences with Storable, in the form of data which, once frozen, could not be thawed!   This was on an AS/400, and it was very data-specific, and I do not know if it was a momentary bug in whatever-it-was version of the CPAN module.   But as it was, I had to quickly scramble and store the data in the database in a different format.   (Fortunately, this was an SQLite file that didn’t have to be shared with anyone, but the occurrence of the problem surprised me greatly, nonetheless.)

Re^4: Best technique to code/decode binary data for inter-machine communication?
by flexvault (Monsignor) on Aug 16, 2012 at 13:13 UTC

    BrowserUk,

    First I'm interested in the code sample you gave:

    $to->send( pack 'n/a*', $binData );
    Currently I write that as:
    $binData = pack('N',length( $data ) ) . $data; $to->send( $binData );
    Is your code a shorthand for the above?

    Second, as you and others have pointed out, I did not use 'binmode' after opening the socket. If I were to add the following:

    binmode Socket, ":raw";
    To both the client and server code, would I be in 'binary' mode on windows, *nix, etc. or would I need to have different client code for each. Reading the latest 'binmode' documentation, it sounds like the function would be ignored on some systems and then used where binary and text definitions differ.

    Third, 'Storable' does not produce 'network neutral' results, so can't be used in this case.

    Fourth, if someone passes a ':utf8' key/value pair to my application and I store the variables in an external file as ":raw", will they be able to use the data as utf8 when they receive the key/value pair back. Until I read the 'binmode' documentation, I didn't think of that possibility!

    Thank you

    "Well done is better than well said." - Benjamin Franklin

      Is your code a shorthand for the above?

      Yes, (kinda:), but more flexible and quicker.

      THe template: n/a* says: pack as many arbitrary binary bytes as are contained in the argument, counting them as you go, and the prepend that data with that count as a network-order unsigned short. C/a* would pack the count as a single byte; N/a* as a network-order unsigned long; and so on.

      The really powerful template is N/(n/a*)* I use for arrays and hashes. It says: pack each input argument as bytes, each prefixed with its length as a network-order ushort; and the prefix the whole result with a single network order ulong that counts all the bytes and all the counts is the count of the fields packed.

      If I were to add the following: binmode Socket, ":raw"; To both the client and server code, would I be in 'binary' mode on windows, *nix, etc. or would I need to have different client code for each.

      On 'nix it will do nothing; on Windows it will turn of crlf modifications (+ prevent the oft-forgotten ^Z == EOF).

      My understanding is that if you use just binmode SOCKET; on all platforms; no translation will be done anywhere and you'll recv exactly what you send.

      When ':raw' first came around, it disabled all PerlIO layers; then they changed it for no good reason and without documentation. Last time I investigated it (on windows only!), it still removed :crlf, but didn't remove all layers. To my knowledge there is no explanation available of what layers get left behind, or why?

      Third, 'Storable' does not produce 'network neutral' results, so can't be used in this case.

      Storable does have nfreeze (network order freeze) which is defined as a "portable format"; though I've never tried using it between 32/64-bit platforms.

      That said, there are many horror stories of people being bitten by Storable; though at least half of them can be traced back to misunderstanding or incompetence.

      That said, having 'discovered' the pack 'N/(n/a*)*' method of packing simple arrays and hashes, I would use that in preference to Storable for non-nested hashes and arrays.

      Fourth, if someone passes a ':utf8' key/value pair to my application and I store the variables in an external file as ":raw", will they be able to use the data as utf8 when they receive the key/value pair back.

      Until you apply some form of encode/decoding operation to a file or data stream, anything you read is just a bunch of bytes.

      If you read bytes and transmit bytes, the receiver gets the data in the same state as if it had read bytes from the original source. If those bytes constitute data encoded in some form you will need to decode it before using it -- but it doesn't matter where (which end of the connection) that decoding happens -- so long as it is done only once and correctly.

      Of course, the definition of 'correct' requires thought. If you transmit utf16le to a big-endian machine, then that machine will need to decode it as 'utf16le' (not just 'utf16' which locally might default to 'utf16be').

      But "The Unicode Problem" -- how the f*** do you know which of the many Unicode standards was used to encode the data??? -- exists wherever you do the decoding. If the receiving machine had read the same bytes from a file, it still has to either "know" (or guess) which of the Unicode Standards was used to encode the data, because the file could have come from anywhere. (eg. the internet).

      Unicode is a f*** up! And will remain that way until they finally require that each of the various binary formats that are encompassed by the Unicode (non)Standard, prefix all encoded data with something that identifies the encoding.

      From your perspective; if you will be transmitting (say) hashes built from input that has previously been decoded, then you will need to understand the Perl Unicode handling tools. I wish I could point you to a definitive reference, but no such animal seems to exist yet.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        BrowserUk,

        WOW, I like that shorthand format, N/(N/a*)* and N/(n/a*)* , look very flexible for future encoding/decoding uses.

          Until you apply some form of encode/decoding operation to a file or data stream, anything you read is just a bunch of bytes.
        This was my take also!
          Unicode is a f*** up!
        Agreed! I haven't looked at this much, but the spec says '22 bits', but Perl code seems to use '24 bits' for each character with the high-order bits being '00'. Whether other implementations do the same I don't know, but seems like too much room for mis-interpretation.

        Thank you

        "Well done is better than well said." - Benjamin Franklin

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://987666]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2024-04-19 10:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found