Category: | Text Processing |
Author/Contact Info | monoxide0184@gmail.com |
Description: | These two functions will serialise (bEncode) or unserialise (bDecode) data. It is based on 'Bencoding' which was designed as part of the bittorrent protocol but seems to be a well designed way of doing things. This is a perl implementation of this encoding. Below is a list of how this encoding works, and covers everthing from scalars to hashs, very nicely. I know this is _VERY_ poorly commented, but if someone has some spare time and wants to comment these 100 odd lines, it would be appreciated :). * Strings are length-prefixed base ten followed by a colon and the string. For example 4:spam corresponds to 'spam'. * Integers are represented by an 'i' followed by the number in base 10 followed by an 'e'. For example i3e corresponds to 3 and i-3e corresponds to -3. Integers have no size limitation. i-0e is invalid. All encodings with a leading zero, such as i03e , are invalid, other than i0e , which of course corresponds to 0. * Lists are encoded as an 'l' followed by their elements (also bencoded) followed by an 'e'. For example l4:spam4:eggse corresponds to ['spam', 'eggs']. * Dictionaries are encoded as a 'd' followed by a list of alternating keys and their corresponding values followed by an 'e'. For example, d3:cow3:moo4:spam4:eggse corresponds to {'cow' => 'moo', 'spam' => 'eggs'} and d4:spaml1:a1:bee corresponds to {'spam' => ['a', 'b']} . Keys must be strings and appear in sorted order (sorted as raw strings, not alphanumerics). |
#!/usr/bin/perl -w # ##### # Provides fuctions for bencoding/decoding ### # Decodes a bencoded string to a hash/array reference sub bDecode { my $dictref = $_[0]; my $retref = undef; if (substr(${$dictref},0,1) eq "d") { # print Dumper($dictref); $retref = {}; ${$dictref} = substr(${$dictref},1); PROCDICTHASH: while (substr(${$dictref},0,1) ne 'e') { my ($key, $value); ${$dictref} =~ /^(-?[\d]+)/; $key = substr(${$dictref}, length($1)+1, $1); ${$dictref} = substr(${$dictref}, length($1)+$1+1); if (${$dictref} =~ /^[ldi\d]/) { $value = bDecode(\${$dictref}); if (!defined($value)) { return undef; } } else { return undef; } $retref->{$key} = $value; } ${$dictref} = substr(${$dictref},1); } elsif (substr(${$dictref},0,1) eq "l") { # print Dumper($dictref); $retref = []; ${$dictref} = substr(${$dictref},1); PROCDICTARR: while (substr(${$dictref},0,1) ne 'e') { if (${$dictref} =~ /^[ldi\d]/) { my $value = bDecode(\${$dictref}); if (!defined($value)) { return undef; } push (@$retref, $value); } else { return undef; } } ${$dictref} = substr(${$dictref},1); } elsif (${$dictref} =~ /^i(-?[\d]+)e/) { ${$dictref} = substr(${$dictref}, length($1)+2); return $1; } elsif (${$dictref} =~ /^([\d]+):/) { $retref = substr(${$dictref}, length($1)+1, $1); ${$dictref} = substr(${$dictref}, length($1)+$1+1); return $retref; } else { return undef; } return $retref; } #Encodes a hash/array ref to a bencoded string sub bEncode { my $dictref = $_[0]; my $retval = ''; if (ref($dictref) eq 'HASH') { $retval = 'd'; for my $key (sort keys %$dictref) { $retval .= length($key).':'.$key; $retval .= bEncode($dictref->{$key}); } $retval .= 'e'; } elsif (ref($dictref) eq 'ARRAY') { $retval = 'l'; for (my $i = 0; $i <= $#{$dictref}; $i++) { $retval .= bEncode($dictref->[$i]); } $retval .= 'e'; } elsif ($dictref =~ /^-?[\d]+$/) { $retval = 'i'.$dictref.'e'; } else { $retval = length($dictref).':'.$dictref; } return $retval; }; |
Back to
Code Catacombs