Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Serialise to binary?

by sectokia (Pilgrim)
on Oct 25, 2015 at 22:33 UTC ( [id://1145896] : perlquestion . print w/replies, xml ) Need Help??

sectokia has asked for the wisdom of the Perl Monks concerning the following question:

Hi wise monks,

Are there any cpan modules that serialise data structures in binary and won't have the issues associated with storable for cross platform compatibility?

I was using storable, but since it wasn't portable and gave issues when switching systems, I have moved on to JSON / Data::Dumper.

However neither of these store binary data as binary, which is costing me space. My binary data is already compressed, so when I compress JSON or Data::Dumper output, it still ends up substantially bigger than storable.

Replies are listed 'Best First'.
Re: Serialise to binary?
by davido (Cardinal) on Oct 26, 2015 at 01:26 UTC

    BSON implements BSON - Binary JSON, which is "...a bin­ary-en­coded seri­al­iz­a­tion of JSON-like doc­u­ments. Like JSON, BSON sup­ports the em­bed­ding of doc­u­ments and ar­rays with­in oth­er doc­u­ments and ar­rays. BSON also con­tains ex­ten­sions that al­low rep­res­ent­a­tion of data types that are not part of the JSON spec. For ex­ample, BSON has a Date type and a BinData type."

    Sounds like that could be a decent fit, particularly since it is probably more portable than Storable.


      I tried BSON, I found that it would throw a lot of warnings on basic structures, especially it seems to miss identify some scalars as floats and attempts to pack them as floats causing "argument isn't numeric in pack". I also found floats having widly inconsistent values when encoded then decoded.

      Because its so heavily tied to MongoDB, they don't see to really care about having it being able to encode arbitrary data structures (evidence by the fact that you have to pass a hash ref, no array ref allowed). They just want to decode their own binary data as they use it in Mongo and re-encode structures setup the same way.

      So I wouldn't recommend it.

        they don't see to really care about having it being able to encode arbitrary data structures

        I think that's an unkind assumption about intent. (N.B. I am the current maintainer.)

        But like JSON, BSON is document-oriented, so is not designed to store raw arrays or scalars the way Storable or Sereal will. So in that sense, it might not be the right choice for your needs.

        Beyond that however, the goal of BSON is to handle whatever you can throw at it as best as possible given the ambiguities mapping data between a dynamic, largely typeless language like Perl and a typed data format like BSON. Knowing that some Perl scalar is binary data and not an arbitrary string is impossible without some hints from the programmer.

        The MongoDB::BSON implementation is in XS and has been part of the MongoDB driver distribution. We hope to eventually split it out so that it can be used independently where warranted.

        The implementation is pure Perl and was originally developed outside MongoDB (but has since been adopted by the company). There are still some areas where it is not yet as good as MongoDB::BSON.

        Even if BSON is not right for this particular problem, if anyone experiences bugs using either implementation, I encourage you to report them or at least email us about them so we can fix them.


        Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.
Re: Serialise to binary?
by Corion (Patriarch) on Oct 26, 2015 at 09:12 UTC

    Depending on your data structure, you might have more luck with Sereal.

Re: Serialise to binary?
by RichardK (Parson) on Oct 26, 2015 at 00:30 UTC

    You might have to do it yourself using pack, but it does give you 16 & 32 bit ints in both big-endian & little-endian.

    #from the docs n An unsigned short (16-bit) in "network" (big-endian) order. N An unsigned long (32-bit) in "network" (big-endian) order. v An unsigned short (16-bit) in "VAX" (little-endian) order. V An unsigned long (32-bit) in "VAX" (little-endian) order.
Re: Serialise to binary?
by Laurent_R (Canon) on Oct 26, 2015 at 00:08 UTC
    Hmm, I am afraid that if you want a binary format that is cross-platform compatible, you'll have to define it yourself, I do not think there is a standard for such a format.

    Well, of course, you may find an existing binary format for data exchange between a given platform pair, or even for a few platforms, but I do not think there is any standard one that is really cross-platform in the wider sense. And there are plenty of quite compelling reasons for that, one of them being that there is not one binary format, but several.

    Having said that, some network protocols do define some low level binary formats that you might want to use. But I am not convinced that such low level formats fit your functional/business needs. You did not say enough on your requirements for giving a definite answer (which I would probably not be able to give anyway, I haven't work in this area for about 15 years and I don't remember enough about these things).

Re: Serialise to binary?
by Your Mother (Archbishop) on Oct 25, 2015 at 23:39 UTC
    My binary data is already compressed, so when I compress JSON or Data::Dumper output, it still ends up substantially bigger than storable.

    This is highly unlikely. If you serialize without whitespace and compress and it is substantially bigger than storable…? This is not my forté but post your code and I’m sure someone can show you where it’s gone sideways.

      An example is when there are a huge number of scalars having random contents. Here compressed storable has 33% over head, where as compressed json has 70%+ overhead.
      use strict; use warnings; use Storable; use IO::Compress::Gzip qw(gzip); use JSON::XS; my (@data,$serial,$gzserial,$json,$gzjson,$i); for($i=0;$i<100000;$i++) { push @data, chr(int(rand(256)))} $serial = Storable::nfreeze(\@data); $json = encode_json(\@data); gzip \$json => \$gzjson; gzip \$serial => \$gzserial; print scalar(@data)."\n"; print length($serial)."\n"; print length($gzserial)."\n"; print length($json)."\n"; print length($gzjson)."\n";

        Oh, nice! I was about to argue that one character scalars making quotation marks more than 60% of the data rigged the test in favor of Storable but I upped the "word" size and the difference remains at about 30% in favor of Storable. Sidebar: on my box at least, Storable sees *negative* change from zipping: i.e., the zipped Storable is slightly bigger than the raw nstore .

Re: Serialise to binary?
by BrowserUk (Patriarch) on Oct 26, 2015 at 15:18 UTC
    My binary data is already compressed, so when I compress JSON or Data::Dumper output, it still ends up substantially bigger than storable.

    How big is this data?

    How often are you interchanging this data?

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
A reply falls below the community's threshold of quality. You may see it by logging in.