http://qs321.pair.com?node_id=461506
Category: Text Processing
Author/Contact Info monoxide0184@gmail.com
Description: These two functions will serialise (bEncode) or unserialise (bDecode) data. It is based on 'Bencoding' which was designed as part of the bittorrent protocol but seems to be a well designed way of doing things. This is a perl implementation of this encoding. Below is a list of how this encoding works, and covers everthing from scalars to hashs, very nicely. I know this is _VERY_ poorly commented, but if someone has some spare time and wants to comment these 100 odd lines, it would be appreciated :).

* Strings are length-prefixed base ten followed by a colon and the string. For example 4:spam corresponds to 'spam'.
* Integers are represented by an 'i' followed by the number in base 10 followed by an 'e'. For example i3e corresponds to 3 and i-3e corresponds to -3. Integers have no size limitation. i-0e is invalid. All encodings with a leading zero, such as i03e , are invalid, other than i0e , which of course corresponds to 0.
* Lists are encoded as an 'l' followed by their elements (also bencoded) followed by an 'e'. For example l4:spam4:eggse corresponds to ['spam', 'eggs'].
* Dictionaries are encoded as a 'd' followed by a list of alternating keys and their corresponding values followed by an 'e'. For example, d3:cow3:moo4:spam4:eggse corresponds to {'cow' => 'moo', 'spam' => 'eggs'} and d4:spaml1:a1:bee corresponds to {'spam' => ['a', 'b']} . Keys must be strings and appear in sorted order (sorted as raw strings, not alphanumerics).
#!/usr/bin/perl -w
#

#####
# Provides fuctions for bencoding/decoding
###

# Decodes a bencoded string to a hash/array reference
sub bDecode {
    my $dictref = $_[0];
    my $retref = undef;
    
    if (substr(${$dictref},0,1) eq "d") {
#        print Dumper($dictref);
        $retref = {};
        ${$dictref} = substr(${$dictref},1);
        PROCDICTHASH: while (substr(${$dictref},0,1) ne 'e') {
            my ($key, $value);
            
            ${$dictref} =~ /^(-?[\d]+)/;
            $key = substr(${$dictref}, length($1)+1, $1);
            ${$dictref} = substr(${$dictref}, length($1)+$1+1);
            if (${$dictref} =~ /^[ldi\d]/) {
                $value = bDecode(\${$dictref});
                if (!defined($value)) {
                    return undef;
                }
            } else {
                return undef;
            }
            
            $retref->{$key} = $value;
        }
        ${$dictref} = substr(${$dictref},1);
    } elsif (substr(${$dictref},0,1) eq "l") {
#        print Dumper($dictref);
        $retref = [];
        ${$dictref} = substr(${$dictref},1);
        PROCDICTARR: while (substr(${$dictref},0,1) ne 'e') {
            if (${$dictref} =~ /^[ldi\d]/) {
                my $value = bDecode(\${$dictref});
                if (!defined($value)) {
                    return undef;
                }
                push (@$retref, $value);
            } else {
                return undef;
            }
        }
        ${$dictref} = substr(${$dictref},1);
    } elsif (${$dictref} =~ /^i(-?[\d]+)e/) {
        ${$dictref} = substr(${$dictref}, length($1)+2);
        return $1;
    } elsif (${$dictref} =~ /^([\d]+):/) {
        $retref = substr(${$dictref}, length($1)+1, $1);
        ${$dictref} = substr(${$dictref}, length($1)+$1+1);
        return $retref;
    } else {
        return undef;
    }

    return $retref;
}

#Encodes a hash/array ref to a bencoded string
sub bEncode {
    my $dictref = $_[0];
    my $retval = '';
    if (ref($dictref) eq 'HASH') {
        $retval = 'd';
        for my $key (sort keys %$dictref) {
            $retval .= length($key).':'.$key;
            $retval .= bEncode($dictref->{$key});
        }
        $retval .= 'e';
    } elsif (ref($dictref) eq 'ARRAY') {
        $retval = 'l';
        for (my $i = 0; $i <= $#{$dictref}; $i++) {
            $retval .= bEncode($dictref->[$i]);
        }
        $retval .= 'e';
    } elsif ($dictref =~ /^-?[\d]+$/) {
        $retval = 'i'.$dictref.'e';
    } else {
        $retval = length($dictref).':'.$dictref;
    }
    return $retval;
};