Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Storing state of execution

by stevieb (Canon)
on Dec 09, 2015 at 14:07 UTC ( [id://1149763]=note: print w/replies, xml ) Need Help??


in reply to Storing state of execution

I've used Storable with great success. Another method I've taken to lately is using JSON, which stores in plain text, but is cross-language (I can write in Perl/Python/insert-language-here, then open it back up with any other one). You could also use Data::Dumper to store and retrieve state (Perl only).

Replies are listed 'Best First'.
Re^2: Storing state of execution
by SuicideJunkie (Vicar) on Dec 09, 2015 at 15:05 UTC

    I would add that which you want depends a lot on the data you need to store.

    If the data is simple and modestly sized, then JSON (or YAML) would probably be best.

    If your data includes reference loops or binary data, or if the data structure is large (speed becomes an issue) then Storable would probably be best.

    Dumper is more of a middle ground, if you need it to be coder-readable and it is a complex data structure, but you can absolutely trust the source of the data when you read it back. I'd only recommend it as debug output, since reading it back in involves running arbitrary perl code.

      or if the data structure is large (speed becomes an issue) then Storable would probably be best.

      That doesn't match my memory. A quick test showed JSON::XS taking just over 1/3 of the time of Storable (and producing almost exactly the same number of bytes of output).

      Using JSON has other advantages. And I consider forcing one to stick to simple data to be one of them.

      - tye        

Re^2: Storing state of execution
by afoken (Chancellor) on Dec 10, 2015 at 05:56 UTC

    One big problem of Storable is that its exact file format depends on the perl version and on the machine perl was compiled for. Changing the processor architecture and/or the perl version begs for trouble.

    Data::Dumper generates executable perl code that has to be parsed back into the program using string eval. That works, sure, but it is a security nightmare: Imagine someone inserting system "rm -rf /" into the saved dump.

    Data::Dumper does not dump everything, sometimes, it just generates dummy code:

    >perl -MData::Dumper -E 'my $double=sub { return 2*shift }; say Dumper +($double)' $VAR1 = sub { "DUMMY" };

    JSON, XML, and YAML don't have those problems. They simply don't allow code references, and they all are independant from the perl version and the processor architecture.

    XML can't store binary data, because some characters (0x00) are not allowed in XML, not even in escaped form. You have to resort to using a hex dump, base64 or quoted-printable encoding.

    XML stores some data multiple times (opening and closing tags contain the element name), wasting more disk space than other formats.

    JSON has data types (string, number, array, key-value pairs, booleans, and null alias undef). It lacks some higher data types, most commonly a date and time type. Usually, one uses strings or key-value pairs ("objects") for that, but you could also use a number (counting days or seconds since an epoch value). Reading back JSON with dates in strings or objects requires some knowledge about the data. You need to know if a string is a date in disguise or just a string.

    JSON does not define comments. Some JSON parsers allow comments. JSON::XS uses shell-style # comments, but that does not fit into a Javascript context (from which JSON is derived). Javascript has /* */ and // comments, that would make the most sense to use in JSON.

    YAML: I can't get it into my head. There are at least two or three ways to represent the same information, and some just don't make sense to me. I try to avoid YAML.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      Data::Dumper does not dump everything, sometimes, it just generates dummy code
      Unless you specify
      $Data::Dumper::Deparse = 1;
      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

        But the deparsed code does not contain all required information in all cases:

        #!/usr/bin/perl -w use strict; use warnings; use feature 'state'; use Data::Dumper; $Data::Dumper::Deparse=1; our %data; our $VAR1; sub insert { my ($name,$href)=@_; $href->{$name}=$href->{'nextID'}->(); print $name," => ",$href->{$name},"\n"; } sub init_data { %data=( nextID => sub { state $n=100; $n++; print "\$n is now $n\n"; return $n; } ); } init_data(); print "Working with the original:\n\n"; insert(a => \%data); insert(b => \%data); insert(c => \%data); my $dump=Dumper(\%data); print "\nData::Dumper output:\n\n"; print "$dump\n"; print "\nWorking with the original again:\n\n"; insert(d => \%data); print "\nWorking with the re-evaluated Data::Dumper output:\n\n"; eval $dump; die $@ if $@; insert(d => $VAR1);

        Output:

        Working with the original: $n is now 101 a => 101 $n is now 102 b => 102 $n is now 103 c => 103 Data::Dumper output: $VAR1 = { 'c' => 103, 'a' => 101, 'nextID' => sub { use warnings; use strict; use feature 'state'; state $n = 100; ++$n; print "\$n is now $n\n"; return $n; }, 'b' => 102 }; Working with the original again: $n is now 104 d => 104 Working with the re-evaluated Data::Dumper output: $n is now 101 d => 101

        Yes, this is constructed. But it shows that deparsing the sub reference is not sufficient to restore all state after a Data::Dumper-eval cycle. The state of $n is lost, creating two colliding IDs.

        It gets even worse without the state feature:

        #!/usr/bin/perl -w use strict; use warnings; use Data::Dumper; $Data::Dumper::Deparse=1; our %data; our $VAR1; sub insert { my ($name,$href)=@_; $href->{$name}=$href->{'nextID'}->(); print $name," => ",$href->{$name},"\n"; } sub init_data { my $n=100; %data=( nextID => sub { $n++; print "\$n is now $n\n"; return $n; } ); } init_data(); print "Working with the original:\n\n"; insert(a => \%data); insert(b => \%data); insert(c => \%data); my $dump=Dumper(\%data); print "\nData::Dumper output:\n\n"; print "$dump\n"; print "\nWorking with the original again:\n\n"; insert(d => \%data); print "\nWorking with the re-evaluated Data::Dumper output:\n\n"; eval $dump; die $@ if $@; insert(d => $VAR1);

        Output:

        Working with the original: $n is now 101 a => 101 $n is now 102 b => 102 $n is now 103 c => 103 Data::Dumper output: $VAR1 = { 'c' => 103, 'b' => 102, 'a' => 101, 'nextID' => sub { use warnings; use strict; ++$n; print "\$n is now $n\n"; return $n; } }; Working with the original again: $n is now 104 d => 104 Working with the re-evaluated Data::Dumper output: Global symbol "$n" requires explicit package name at (eval 8) line 8. Global symbol "$n" requires explicit package name at (eval 8) line 9. Global symbol "$n" requires explicit package name at (eval 8) line 10.

        On the other hand, complaining loudly is better than just generating repeated IDs.

        Stupidly removing use strict and use warnings from the code hides the error, and results in worse behaviour:

        #!/usr/bin/perl -w use Data::Dumper; $Data::Dumper::Deparse=1; our %data; our $VAR1; sub insert { my ($name,$href)=@_; $href->{$name}=$href->{'nextID'}->(); print $name," => ",$href->{$name},"\n"; } sub init_data { my $n=100; %data=( nextID => sub { $n++; print "\$n is now $n\n"; return $n; } ); } init_data(); print "Working with the original:\n\n"; insert(a => \%data); insert(b => \%data); insert(c => \%data); my $dump=Dumper(\%data); print "\nData::Dumper output:\n\n"; print "$dump\n"; print "\nWorking with the original again:\n\n"; insert(d => \%data); print "\nWorking with the re-evaluated Data::Dumper output:\n\n"; eval $dump; die $@ if $@; insert(d => $VAR1);

        Output:

        Working with the original: $n is now 101 a => 101 $n is now 102 b => 102 $n is now 103 c => 103 Data::Dumper output: $VAR1 = { 'a' => 101, 'c' => 103, 'nextID' => sub { ++$n; print "\$n is now $n\n"; return $n; }, 'b' => 102 }; Working with the original again: $n is now 104 d => 104 Working with the re-evaluated Data::Dumper output: $n is now 1 d => 1

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      afoken is used to give very esaustive explainations. ++ as always. But is not worth to mention also Sereal ? I have used it with profit, but i have not touched his limits because was a plain usage of it.

      have you experence with this also?

      L*

      PS What they say about their module is definetively intriguing! see Sereal Comparison Graphs

      L*
      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
        But is not worth to mention also Sereal ?

        Never used it. I stumbled over Sereal some time ago, then forgot it, because I did not need it. It looks quite promising, and has similarities to various other binary formats (like BSON, BJSON, MessagePack).

        All of those formats promise compact data storage and easy parsing. But you lose one big advantage of text-base file formats: You can not simply read them using less, your favorite web browser, or your favorite text editor. You need a converter and/or a special viewer.

        If storage size or data transfer volume is an issue, the text-based formats can usually be compressed quite well, resulting in sizes similar to binary formats.

        As usual, Wikipedia has a big list, containing both binary and text-based formats: https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        I prefer to have a storage format that by definition can not contain executable code instead of relying on a filter that tries to prevent malicious code execution inside a string eval. One bug in Safe and the "SafestUndumper" is no longer save, but instead happily executes malicious code.

        Also, the "non-executable" formats force the programmer to use a parser. There is no way to accidentally or intentionally use a string eval on those formats.

        So, who would intentionally use a string eval on untrusted code?

        • The new programmer who does not know enough about the project.
        • The new programmer who did not learn the style guide by heart.
        • The lazy programmer who thinks "It's just a quick hack, I'll use string eval for now because I trust my current, hand-written config file, and fix that problem later." (We all know from experience that it won't be fixed until at least a few years later.)
        • The stupid programmer who thinks "all of that stinking modules are just a stupid waste of time, eval is much faster".

        A little bit of bean counting:

        Actually, every storage format that can contain strings can - in theory - also contain executable Perl code. But when reading back formats like XML or JSON, an explicit string eval on an extracted string is required, and that string eval is not present in the library reading the file format (or, at least, it should not be present).

        Oh, and string eval means more than just eval $string:

        • do $filename is a string eval on a file content - exactly what the four programmers from above would like to use to undump Data::Dumper output.
        • require $filename - it's do $filename at the core, plus a little bit of book keeping to avoid repeated reading of the file.
        • use $filename - require with an implicit BEGIN block.
        • evalbytes $bytes - new since v5.16

        And finally: Any Javascript compiler/interpreter must be able to read and execute JSON, as it is a very restricted subset of Javascript/ECMAScript. That also means that using Javascripts eval (always a string eval) to read JSON is a tempting, but stupid idea, on the same level as using Perl's string eval to read Data::Dumper output. Since ECMAScript Fifth Edition (2009), there is a special JSON parser embedded in the Javascript environment (see https://github.com/douglascrockford/JSON-js/blob/master/README).

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      ++ That's a spectacular explanation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1149763]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-03-28 18:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found