Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

serialise/unserialising data structures

by monoxide (Beadle)
on May 29, 2005 at 12:51 UTC ( #461506=sourcecode: print w/replies, xml ) Need Help??
Category: Text Processing
Author/Contact Info monoxide0184@gmail.com
Description: These two functions will serialise (bEncode) or unserialise (bDecode) data. It is based on 'Bencoding' which was designed as part of the bittorrent protocol but seems to be a well designed way of doing things. This is a perl implementation of this encoding. Below is a list of how this encoding works, and covers everthing from scalars to hashs, very nicely. I know this is _VERY_ poorly commented, but if someone has some spare time and wants to comment these 100 odd lines, it would be appreciated :).

* Strings are length-prefixed base ten followed by a colon and the string. For example 4:spam corresponds to 'spam'.
* Integers are represented by an 'i' followed by the number in base 10 followed by an 'e'. For example i3e corresponds to 3 and i-3e corresponds to -3. Integers have no size limitation. i-0e is invalid. All encodings with a leading zero, such as i03e , are invalid, other than i0e , which of course corresponds to 0.
* Lists are encoded as an 'l' followed by their elements (also bencoded) followed by an 'e'. For example l4:spam4:eggse corresponds to ['spam', 'eggs'].
* Dictionaries are encoded as a 'd' followed by a list of alternating keys and their corresponding values followed by an 'e'. For example, d3:cow3:moo4:spam4:eggse corresponds to {'cow' => 'moo', 'spam' => 'eggs'} and d4:spaml1:a1:bee corresponds to {'spam' => ['a', 'b']} . Keys must be strings and appear in sorted order (sorted as raw strings, not alphanumerics).
#!/usr/bin/perl -w
#

#####
# Provides fuctions for bencoding/decoding
###

# Decodes a bencoded string to a hash/array reference
sub bDecode {
    my $dictref = $_[0];
    my $retref = undef;
    
    if (substr(${$dictref},0,1) eq "d") {
#        print Dumper($dictref);
        $retref = {};
        ${$dictref} = substr(${$dictref},1);
        PROCDICTHASH: while (substr(${$dictref},0,1) ne 'e') {
            my ($key, $value);
            
            ${$dictref} =~ /^(-?[\d]+)/;
            $key = substr(${$dictref}, length($1)+1, $1);
            ${$dictref} = substr(${$dictref}, length($1)+$1+1);
            if (${$dictref} =~ /^[ldi\d]/) {
                $value = bDecode(\${$dictref});
                if (!defined($value)) {
                    return undef;
                }
            } else {
                return undef;
            }
            
            $retref->{$key} = $value;
        }
        ${$dictref} = substr(${$dictref},1);
    } elsif (substr(${$dictref},0,1) eq "l") {
#        print Dumper($dictref);
        $retref = [];
        ${$dictref} = substr(${$dictref},1);
        PROCDICTARR: while (substr(${$dictref},0,1) ne 'e') {
            if (${$dictref} =~ /^[ldi\d]/) {
                my $value = bDecode(\${$dictref});
                if (!defined($value)) {
                    return undef;
                }
                push (@$retref, $value);
            } else {
                return undef;
            }
        }
        ${$dictref} = substr(${$dictref},1);
    } elsif (${$dictref} =~ /^i(-?[\d]+)e/) {
        ${$dictref} = substr(${$dictref}, length($1)+2);
        return $1;
    } elsif (${$dictref} =~ /^([\d]+):/) {
        $retref = substr(${$dictref}, length($1)+1, $1);
        ${$dictref} = substr(${$dictref}, length($1)+$1+1);
        return $retref;
    } else {
        return undef;
    }

    return $retref;
}

#Encodes a hash/array ref to a bencoded string
sub bEncode {
    my $dictref = $_[0];
    my $retval = '';
    if (ref($dictref) eq 'HASH') {
        $retval = 'd';
        for my $key (sort keys %$dictref) {
            $retval .= length($key).':'.$key;
            $retval .= bEncode($dictref->{$key});
        }
        $retval .= 'e';
    } elsif (ref($dictref) eq 'ARRAY') {
        $retval = 'l';
        for (my $i = 0; $i <= $#{$dictref}; $i++) {
            $retval .= bEncode($dictref->[$i]);
        }
        $retval .= 'e';
    } elsif ($dictref =~ /^-?[\d]+$/) {
        $retval = 'i'.$dictref.'e';
    } else {
        $retval = length($dictref).':'.$dictref;
    }
    return $retval;
};
Replies are listed 'Best First'.
Re: serialise/unserialising data structures
by castaway (Parson) on May 30, 2005 at 11:06 UTC
    Not bad.. I was playing with bdecode/bencode recently (implemented them in elisp, fun) ..

    A comment or two: Your code actually only encodes/decodes lists and dictionaries, it will completely ignore single values, eg it wont parse "4:spam" at all. You also have the string/integer parsing code in there twice. You can solve both of these in one go by making the main if (substr(${$dictref},0,1) eq "d") { statement also do strings and integers, and recursing when encountering these inside the dictionary/hash, instead of parsing them on the spot.

    Ditto the encoding.

    Pretty nifty all the same :)

    C.

      The main reason for not doing strings/integers was the idea that 98% of the things that you would want to do would be some sort of list/array not a single value, but i can see the value in doing it that way... also something that is not "ideal" is the fact that it will erase the string you pass to it in the decode function, returning the hash/array, but deleting everything out of the scalar ref that you pass to it. ie.
      bDecode(\$str); # $str is now == ''
      Any suggestions on fixing this would be appreciated.

      NOTE: I implemented castaway's suggestions. Very nice idea there :).
Re: serialise/unserialising data structures
by zentara (Archbishop) on May 30, 2005 at 21:48 UTC
    Just as an observation, I was able to make it run with "use strict;" with the following addition:
    #Encodes a hash/array ref to a bencoded string sub bEncode { my $dictref = $_[0]; my $retval = ''; #add this line my ($key, $value); if (ref($dictref) eq 'HASH') { ..... .....

    I'm not really a human, but I play one on earth. flash japh
      Hmmm... i use it with use strict; as it is there... or at least it is using strict in the file that do's the file that those functions are in, and it works fine. If you notice, that line is right after the if statement, $key is only use in the HASH ref section, $value is used in both the ARRAY and HASH ref section, and both are appropriatly myed, so i cannot see why you had troubles.
        Yes it's weird, I just downloaded the code again, and it worked fine, but I still have the original file which gave me errors. I include it here, just because I can't see where the difference is? Maybe a } got clipped somewhere? It has to be Monday. :-)

        I'm not really a human, but I play one on earth. flash japh
Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://461506]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2020-06-01 23:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?



    Results (12 votes). Check out past polls.

    Notices?