Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Converting an array of bytes into a BigInt - speed considerations

by dempa (Friar)
on Jan 23, 2003 at 12:17 UTC ( [id://229290]=perlquestion: print w/replies, xml ) Need Help??

dempa has asked for the wisdom of the Perl Monks concerning the following question:

Greetings!

In my current project (reinventing the wheel: a request tracking system) I use Crypt::RSA to encrypt username and password strings when sent between client and server. To exchange the public key between two perl clients are a piece of cake but I need to be able to exchange the key with a Visual Basic.NET client aswell. To do this I convert the modulus (a really big integer) to an array of bytes and then to a comma separated string. This is fairly easy to import in .NET Framework cryptography API (it works, I've tested it).

Anyway, I use the module Math::BigInt to be able to handle really large numbers. I have two subroutines: bigint_to_bytearray($bigint) and bytearray_to_bigint(@bytearray). The first one (used when converting the modulus to an array of bytes) runs fast on my development box (a Pentium II 350MHz) but the second one (used when converting an array of bytes back to a bigint) takes 4-5 seconds.

I'm wondering if there's something I can do to speed this up, besides getting a better development box? :)

sub bigint_to_bytearray { my $bigint = shift; my @bytes; while(1) { my ($q,$r) = $bigint->brsft(8); push(@bytes,$r+0); last if $q == 0; $bigint = Math::BigInt->new($q); } return @bytes; }
## This is the one I would like to speed up ## The array looks something like (127,6,64,27,166,33 .... ) sub bytearray_to_bigint { my @array = @_; my $count = Math::BigInt->new('0'); my $result = Math::BigInt->new('0'); foreach my $a (@array) { $result += $a * (256**$count++); } return $result; }

-- 
dempa

Replies are listed 'Best First'.
Re: Converting an array of bytes into a BigInt - speed considerations
by bart (Canon) on Jan 23, 2003 at 12:47 UTC
    It sounds like you would want pack() or unpack() for a BigInt. But as far as I can see, the docs don't mention this as an option.

    OK, perhaps a slightly faster implementation might be to replace the multiplication by a shift left. And definitely drop the exponentiation if you can. I suspect this is the real cause for the long processing time.

    Er... you have a Little Endian representation in the "byte array", haven't you? Big Endian is easier to process. OK, here I go:

    use Math::BigInt; print bytearray_to_bigint(127,2); use Data::Dumper; sub bytearray_to_bigint { my @array = reverse @_; #Big Endian my $result = Math::BigInt->new('0'); foreach my $a (@array) { ($result <<= 8) += $a; # This alternative doesn't work: # $result = $result->blsft(8)->badd($a); } return $result; }
    To be honest, I'd rather have replaced the overloaded operators with explicit calls to blsft and badd, but for some reason unknown to me these appear to return simple strings, not Math::BigInt objects, as I gather from the docs they should. (BTW I'm using 0.01, as it comes with perl 5.6.1.) Oh well. I hope this is a decent starting point anyway.

    Update: I have the impression that Math::BigInt is a Pure Perl implementation by default. Ugh! If you use it with one of the XS "driver" libraries, you might get a real speed boost. See the section on "MATH LIBRARY" in the new docs. Math::BigInt::BitVect+Bit::Vector and Math::BigInt::FastCalc look like the most promising candidates to me. You might have to download an update to Math::BigInt to make use of it. They are available for Windows, Perl 5.6.x at the 5.6plus repository. Ooh, "Math::BigIntFast" looks nice, too, even though I can't find a trace of it on CPAN. Odd.

    Update: It suddenly dawned on me that it's easy to get up to a factor 4 of speedup, by unpacking your byte stream into long integers (unsigned 32 bits) instead of in bytes. That way, you can construct your bigint in 4 bytes per step, instead of just 1 byte for each loop.

    Unpacking a string to longs, on a PC (Little Endioan), can be done using:

    @ulongs = unpack 'V*', $bytestring;
    Of course, that means that you need to shift left by 32 bits, instead of by 8, per step.

      Thanks for the suggestions (everyone)!

      I just updated Math::BigInt to v1.64 and now my program won't work. Apparently, brsft doesn't return the remainder anymore, only the quotient (it used to return a list with (quo,rem)).

      Why, oh, why didn't I pay more attention in math class at college! :)

      update: I was just voicing my frustration, not asking anyone to post a solution for me.

      -- 
      dempa

        That's a bummer. However, it needn't be a total disaster. You can get both quotient and remainder, by making two calls on the same data. For example, band() seems to work just fine as follows, to get at the remainder of a division by 256:
        use Math::BigInt; $a = new Math::BigInt(0xFE78A3); printf '%02X', $a->band(0xFF);
        Result:
        A3
        This is with Math::BigInt version 1.64.

        Update: Yeah, the real solution would be for you to go back to school. ;-)

Re: Converting an array of bytes into a BigInt - speed considerations
by jmcnamara (Monsignor) on Jan 23, 2003 at 13:42 UTC

    On the assumption that a shift would be faster than a multiplication/exponentiation I tried this:
    sub bytearray_to_bigint2 { my $result = Math::BigInt->new('0'); for my $byte (reverse @_) { my $tmp = Math::BigInt->new($byte); $result = $result->blsft(8); $result += $tmp; } return $result; }

    This appears to be faster, see the bechmark below. The speed increases for bigger ints.

      Does it give the proper results? As in the old version I'm using, blsft() appears to return a string not a bigint object, $result += $tmp; might not do the proper thing.
      use Math::BigInt; use Data::Dumper; print Dumper +Math::BigInt->new(1)->blsft(8); print $Math::BigInt::VERSION;
      Result:
      $VAR1 = '+256'; 0.01
      OTOH, $tmp is a bigint object. Hmm...

      OK, no reason to panic:

      $result = 1; $result += new Math::BigInt('1234567890' x 10);
      does seem to work alright.

        It does give the correct results. Give me some credit. ;-)

        I've updated the above benchmark to include the test that I removed before posting.

        I'd agree that the Math::BigInt objects behave a little bit counter-intuitively.

        --
        John.

Re: Converting an array of bytes into a BigInt - 1500% quicker.
by BrowserUk (Patriarch) on Jan 23, 2003 at 17:17 UTC

    This benchmarks as around %1500 quicker than the original.

    s/iter org new best org 5.62 -- -78% -94% new 1.26 346% -- -73% best 0.345 1529% 266% -- The results are the same

    It'll probably be deemed unreadble, but it is a bit quick.

    sub ba2bi2 { my @ba = reverse (@_, (0) x (4 - ( @_%4 || 4 ) ) ); my $result = Math::BigInt->new(0); $result *= 4294967296 , $result += $_ for map{ unpack 'N' , pack 'C4' , @ba[4*$_ .. 4*$_+3] } 0 .. ($#ba/4); return $result; }

    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.


      Nice. ++

      Dealing with the data in terms of 4-byte longs reduces the operations by 75%, which, I guess, gives the four fold speed-up.*

      Who says it looks unreadable. ;-) The (4 - ( @_%4 || 4 )) part looked familiar. I had to do something like that a while ago to pad chars to a 4-byte boundary. In terms of your notation this is what I came up with:

      ((4 - @_%4)%4)

      Update: *Reading back I see that bart spotted this exploit earlier.

      --
      John.

      Cool benchmark. I did a couple of variants of your routine to see if removing the map and simplifying the code helped. Interestingly it didn't really. The one called "unpack" is the furthest departure from your code but is neglibly faster when averaged over a few benchmarks. Also interesting was that the "clean" version without the map was consistently slower than yours, as well as any of the modifier versions.

      It would be interesting to put the other offered solutions into the benchmark, but sleep calls.

      BTW, you get a more accurate average if you use a negative timing. Benchmark.pm will run the benchmark until the clock ticks over, whereas unless your count is really high you stand a good chance of being on the misleading side of the tick.

      Cheers,

      ba2bi2(@bytes) +123456781234567812345678123456781234567812345678123456781234567812345 +67812345678 bytearray_to_bigint2(@bytes) +123456781234567812345678123456781234567812345678123456781234567812345 +67812345678 ba2bi2_nomap(@bytes) +123456781234567812345678123456781234567812345678123456781234567812345 +67812345678 ba2bi2_unpack(@bytes) +123456781234567812345678123456781234567812345678123456781234567812345 +67812345678 bytearray_to_bigint(@bytes) +123456781234567812345678123456781234567812345678123456781234567812345 +67812345678 ba2bi2_clean(@bytes) +123456781234567812345678123456781234567812345678123456781234567812345 +67812345678 Benchmark: running ba2bi2, clean, mult, nomap, shift, unpack, each for at least 10 CPU s +econds... ba2bi2: 13 wallclock secs (10.76 usr + 0.01 sys = 10.77 CPU) @ 69 +.21/s (n=745) clean: 12 wallclock secs (10.29 usr + 0.00 sys = 10.29 CPU) @ 68 +.55/s (n=705) mult: 12 wallclock secs (10.12 usr + 0.00 sys = 10.12 CPU) @ 5 +.24/s (n=53) nomap: 12 wallclock secs (10.08 usr + 0.00 sys = 10.08 CPU) @ 69 +.48/s (n=700) shift: 12 wallclock secs (10.28 usr + 0.02 sys = 10.30 CPU) @ 6 +.99/s (n=72) unpack: 12 wallclock secs (10.13 usr + 0.00 sys = 10.13 CPU) @ 71 +.05/s (n=720) Rate mult shift clean ba2bi2 nomap unpack mult 5.24/s -- -25% -92% -92% -92% -93% shift 6.99/s 33% -- -90% -90% -90% -90% clean 68.5/s 1208% 881% -- -1% -1% -4% ba2bi2 69.2/s 1221% 890% 1% -- -0% -3% nomap 69.5/s 1226% 894% 1% 0% -- -2% unpack 71.0/s 1256% 917% 4% 3% 2% --

      --- demerphq
      my friends call me, usually because I'm late....

        That's interesting. I did have a go at coming up with my own 'clean' version after posting. That's generally the way it works with me. I get something to work, post then try and clean it up and optimise it without breaking it. The biggest headache was working out where and how much padding to add. The rest came relatively easily, if convoluted. I guess I'm a messy thinker.

        Anyway, my best effort at a cleaner version is this, which is somewhat different again. TIMWTDI indeed.

        sub ba2bi3 { my $ba = pack 'C*', (@_, (0) x (4 - ( @_ % 4 || 4 ) ) ); my $result = Math::BigInt->new(0); my $p = length($ba) - 4; do { $result <<= 32; #!4294967296; $result += unpack 'V', substr $ba, $p , 4; } while ($p-=4) >= 0; $result; } print ba2bi3( bigint_to_bytearray( Math::BigInt->new('1234567890' x 8) + ) );
        I had thought that be getting rid of the reverse, this would be be a benefit. It is on smaller numbers:

        11: +1234567890 <<Length and value under test Benchmark: running best, clean, new, org , each for at least 10 CPU seconds best: 11 wallclock secs (10.59 usr + 0.00 sys = 10.59 CPU) @ 20 +8.58/s (n=2208) clean: 11 wallclock secs (10.55 usr + 0.00 sys = 10.55 CPU) @ 21 +3.94/s (n=2256) new: 11 wallclock secs (10.51 usr + 0.00 sys = 10.51 CPU) @ 60 +.26/s (n=633) org: 11 wallclock secs (10.35 usr + 0.00 sys = 10.35 CPU) @ 24 +.36/s (n=252) Rate org new best clean org 24.4/s -- -60% -88% -89% new 60.3/s 147% -- -71% -72% best 209/s 756% 246% -- -3% clean 214/s 778% 255% 3% -- The results are the same

        However, when the numbers get larger, the benefit disappears, swamped by the gains of 4-bytes-at -a-time processing and the cost of BigInt math as here.

        301: +1234567890123456789012345678901234567890 1234567890123456789012345678901234567890 1234567890123456789012345678901234567890 1234567890123456789012345678901234567890 1234567890123456789012345678901234567890 1234567890123456789012345678901234567890 1234567890123456789012345678901234567890 12345678901234567890 Benchmark: running best, clean, new, org , each for at least 10 CPU seconds ... s/iter org new clean best org 5.82 -- -77% -94% -94% new 1.31 344% -- -73% -73% clean 0.352 1555% 273% -- -1% best 0.349 1569% 276% 1% -- The results are the same

        I'm intrigued by the differences in the benchmarks on your machine and mine. Even when I run mine on an 80-byte number as you did, I still get gains of well over 1000%. I can only assume that your machine is using a compiled big int library, rather than the pure perl versionI am stuck with. Still, iif the performance difference is only a factor of 2, perl's doing pretty damn well I think.

        For comaprison I stuck your quickest version into my benchmark and ran it with the 300 digit number of my original benchmark and got these results.

        C:\test>229290 Name "main::ba" used only once: possible typo at C:\test\229290.pl lin +e 67. 301: +12345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890 12345678901234567890123456789012345678901234567890 Benchmark: running best, clean, demerphq, new, org , each for at least + 10 CPU seconds Rate org new clean demerphq best org 0.173/s -- -78% -94% -94% -94% new 0.770/s 345% -- -73% -73% -73% clean 2.83/s 1533% 267% -- -1% -1% demerphq 2.86/s 1553% 272% 1% -- -0% best 2.86/s 1553% 272% 1% 0% -- The results are the same

        Well done for cracking the no-intermediate-string version (ba2bi2_unpack), I was certain that was possible, but didn't succeed in getting there.

        BTW. I realise I had snipped it off on my original post, but as you can see from the above, I do generally use the clock-tick seconds, rather than the iteration's method of benchmarking these days. Mostly because it gives me a reasonable indication of how long the run is likely to take. I always do nice long run (make myself a cup of coffee and drink it long) before I come to a conclusion.

        Now just think of the numbers you could get on a 64 bit machine :^)


        Examine what is said, not who speaks.

        The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Re: Converting an array of bytes into a BigInt - speed considerations
by castaway (Parson) on Jan 23, 2003 at 12:29 UTC
    Did you look at the pack function, it converts lists of things into a binary structure and returns it in a string.

    C.

      Yes, I know, but I seldom use pack and unpack. If it's possible to do this with pack I guess I have to read up on it.

      (Perl is really wonderful this way; I've been programming Perl for 6 years and still there are things relatively unknown to me!)

      -- 
      dempa

Re: Converting an array of bytes into a BigInt - speed considerations
by pg (Canon) on Jan 23, 2003 at 16:42 UTC
    Two points:
    1. A tiny change should save you big time. Instead of doing the exponential stuff of 256 every time, you should use a variable to store the current value of 256 ** k from each iteration, and next iteration just multiply it by 256, to get 256 ** (k + 1), see the code.

      Now the complexity is a linear function of your array size, ~ o(n).

    2. After my code change, you don't need a $count any more, but I still want to mention that there is absolutely no need to use BigInt for $count, that's really really huge, is your number that HUGE?
    Be careful, pack and unpack does not fit in the picture.
    sub bytearray_to_bigint { my @array = @_; my $base = Math::BigInt->new('1'); my $result = Math::BigInt->new('0'); foreach my $a (@array) { $result += $a * $base; $base *= 256; } return $result; }
Re: Converting an array of bytes into a BigInt - speed considerations
by pike (Monk) on Jan 23, 2003 at 13:15 UTC
    I guess that it is probably the exponentiation that slows you down (as bart remarked). The easiest thing I can think of would be to hardcode the powers of 256 in a separate array. So the loop would become:

    my @factors = (1, 256, 65536, ...); foreach my $a (@array) { $result += $a * $factors[$count++]; }
    HTH,

    pike

Horner Rule Re: Converting an array of bytes into a BigInt
by Zaxo (Archbishop) on Jan 24, 2003 at 05:17 UTC

    Horner rule for evaluating a polynomial from an array of coefficients:

    sub bytearray_to_bigint { my $result = Math::BigInt->new('0'); while (@_) { $result <<= 8; # was *= 256; $result += pop; } $result; }
    This algorithm is pretty efficient and needs no data tables kept.

    After Compline,
    Zaxo

      Methinks maybe Horner used an 8-bit machine and binary rather than stringyfied numbers :^).


      Examine what is said, not who speaks.

      The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Re: Converting an array of bytes into a BigInt - speed considerations
by aplonis (Pilgrim) on May 02, 2005 at 01:44 UTC

    I had some problems with earlier examples. So I cobbled up my own version which should work for any N-bit (16, 32, 64, etc) PC.

    use strict; use Math::BigInt; ############################ # BEGIN N-BIT SYSTEM CHECK # ############################ # Calculate the bit-width of cells on this system. sub count_bits { my $i = 0; my $q = ~0x0; # Set all bits while ( $q ) { # All shifted out? $q = $q << 1; # Shift left $i++; # Accum shifts } return $i; # Return bit-width } my $bits = count_bits; # System bit-width. my $cells = int( $bits / 8 ); # max bytes in array my $mask = ~0x0; # Full mask = 0xF* my $top_bit = ~( $mask >> 1 ); # High mask = 0x80* ########################## # END N-BIT SYSTEM CHECK # ########################## # Convert a BigInt to an array. sub bint_to_array { my $quo = shift; my $msk = Math::BigInt->new($mask); my @cls; until ( $quo->is_zero() ) { my $rem = $quo->copy(); # Lest modify ruin. $rem->band($msk); # REM = LSC $quo->brsft($bits); # QUO = Cells above unshift @cls, $rem->bstr(); # Keep remainder } return @cls; } # Convert an array to a BigInt. sub array_to_bint { my @cls = @_; my $out = Math::BigInt->new(0); my $i = 0; while ( @_ ) { my $lsc = Math::BigInt->new( pop @_ ); # LSC $lsc->blsft($bits * $i); # Shift up $out->bior($lsc); # OR to LSC ++$i; } return $out; } sub demo_array_bint { print "\n\nDemo of bint/array conversion... \n"; my $bint_1 = new Math::BigInt($_[0]); print "\tINPUT DECIMAL = ", $bint_1->bstr(), "\n"; my @array = bint_to_array($bint_1); print "\tTHRUPUT ARRAY = ", join ', ', @array, "\n"; my $bint_2 = array_to_bint(@array); print "\tOUTPUT DECIMAL = ", $bint_2->bstr(), "\n"; } demo_array_bint('1234567890' x 6);

    Output looks like this on NetBSD 2.0 using Perl 5.8.6.

    baal: {39} perl foo.pl Demo of bint/array conversion... INPUT DECIMAL = 123456789012345678901234567890123456789012345 +678901234567890 THRUPUT ARRAY = 19, 2868184292, 3156107799, 1065854007, 23524 +60956, 2362438038, 3460238034, OUTPUT DECIMAL = 123456789012345678901234567890123456789012345 +678901234567890 baal: {40}

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://229290]
Approved by Tanalis
Front-paged by Tanalis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-04-24 02:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found