Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^2: Efficient bit-twiddling in Perl.

by BrowserUk (Patriarch)
on Feb 28, 2013 at 17:28 UTC ( [id://1021080]=note: print w/replies, xml ) Need Help??


in reply to Re: Efficient bit-twiddling in Perl.
in thread Efficient bit-twiddling in Perl.

Before doing the hard part, I looked at the speeds halfway. It is not worth pursuing.

Indeed. My initial thought for this was to use unpack, and I came up with this unholy construct:

unpack 'vccc', join'',map{ pack'b*', $_ } unpack 'a14a6a6a6', unpack ' +B32', pack 'N', $n;

Which does the job but (not unexpectedly) is an order of magnitude slower than my second attempt in the OP.

That was a pretty good saving, but whenever I have asked these type of questions in the past, someone has always come up with a better solution.

And indeed, salva has; albeit only a further 20% saving. I also tried using separate lookup tables to save a dereference to no avail:

#! perl -slw use strict; use Time::HiRes qw[ time ]; my @lookup; $#lookup = 0x3ffff; $lookup[ $_ ] = [ ( $_ & 0x3f000 ) >> 12, ( $_ & 0xfc0 ) >> 6, $_ & 0x +3f ] for 0 .. 0x3ffff; my( @nxt, @mid, @bot ); $nxt[ $_ ] = ( $_ & 0x3f000 ) >> 12, $mid[ $_ ] = ( $_ & 0xfc0 ) >> 6, $bot[ $_ ] = $_ & 0x3f for 0 .. 0x3ffff; sub stuff{ # print "@_"; } our $ITERS //= 10e6; my $n = 0x80061861; my $start = time; for ( 1 .. $ITERS ) { stuff( ( $n & 0xffc00000 ) >> 18, ( $n & 0x0003f000 ) >> 12, ( $n & 0x00000fc0 ) >> 6, ( $n & 0x0000003f ) ); } printf "Shift&and took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for ( 1 .. $ITERS ) { stuff( $n >> 18, @{ $lookup[ $n & 0x3ffff ] } ); } printf " Lookup took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for ( 1 .. $ITERS ) { my $i = $n & 0x3ffff; stuff( $n >> 18, $nxt[ $i ], $mid[ $i ], $bot[ $i ] ); } printf " Lookup2 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for ( 1 .. $ITERS ) { stuff( unpack 'vccc', join'',map{ pack'b*', $_ } unpack 'a14a6a6a6 +', unpack 'B32', pack 'N', $n ); } printf "(un)pack took: %.12f seconds\n", ( time() - $start ) / $ITERS +; __END__ C:\test>1021064 Shift&and took: 0.000000482421 seconds Lookup took: 0.000000386419 seconds Lookup2 took: 0.000000547556 seconds (un)pack took: 0.000005933478 seconds

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^3: Efficient bit-twiddling in Perl.
by kennethk (Abbot) on Feb 28, 2013 at 19:39 UTC
    I'll qualify this result to say salva's result doesn't necessarily buy you anything if you are only doing this millions of times. On my machine, I get the timings
    Generation took: 0.218580069542 seconds Shift&and took: 0.000000471000 seconds Lookup took: 0.000000395095 seconds
    Where generation is calculated with
    my $start = time; for (1..100) { my @lookup; $#lookup = 0x3ffff; $lookup[ $_ ] = [ ( $_ & 0x3f000 ) >> 12, ( $_ & 0xfc0 ) >> 6, $_ & 0x +3f ] for 0 .. 0x3ffff; } printf "Generation took: %.12f seconds\n", ( time() - $start )/100;
    I ran the transforms for 10^7 iterations, though frankly all metrics were still a little volatile for my taste. This puts breakeven at 2.9 million iterations and 5% speed up at 4.2 million. YMMV.

    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      The lookup table generation is 2/10th second, 1-off cost at startup.

      Completely insignificant to the cost of the inner loop lookup code that could theoretically be executed 4 billion times. (Once for each 32-bit number.) More typical is mid 100s to low 1000s of millions of times.

      A small saving overall, but only one part of the inner loop code I am optimising. Already down from a couple of weeks to ~4 days.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: Efficient bit-twiddling in Perl.
by ggoebel (Sexton) on Mar 01, 2013 at 03:43 UTC

    Bit::Vector as you might guess isn't any better...

    use strict; use warnings; use Time::HiRes qw[ time ]; use Bit::Vector; my @lookup; $#lookup = 0x3ffff; $lookup[ $_ ] = [ ( $_ & 0x3f000 ) >> 12, ( $_ & 0xfc0 ) >> 6, $_ & 0x +3f ] for 0 .. 0x3ffff; my( @nxt, @mid, @bot ); $nxt[ $_ ] = ( $_ & 0x3f000 ) >> 12, $mid[ $_ ] = ( $_ & 0xfc0 ) >> 6, $bot[ $_ ] = $_ & 0x3f for 0 .. 0x3ffff; sub stuff{ # print "@_"; } our $ITERS //= 10e6; my $n = 0x80061861; my $start = time; for ( 1 .. $ITERS ) { stuff( ( $n & 0xffc00000 ) >> 18, ( $n & 0x0003f000 ) >> 12, ( $n & 0x00000fc0 ) >> 6, ( $n & 0x0000003f ) ); } printf " Shift&and took: %.12f seconds\n", ( time() - $start ) / $ITE +RS; $start = time; for ( 1 .. $ITERS ) { stuff( $n >> 18, @{ $lookup[ $n & 0x3ffff ] } ); } printf " Lookup took: %.12f seconds\n", ( time() - $start ) / $ITE +RS; $start = time; for ( 1 .. $ITERS ) { my $i = $n & 0x3ffff; stuff( $n >> 18, $nxt[ $i ], $mid[ $i ], $bot[ $i ] ); } printf " Lookup2 took: %.12f seconds\n", ( time() - $start ) / $ITE +RS; $start = time; for ( 1 .. $ITERS ) { stuff( unpack 'vccc', join'',map{ pack'b*', $_ } unpack 'a14a6a6a6 +', unpack 'B32', pack 'N', $n ); } printf " (un)pack took: %.12f seconds\n", ( time() - $start ) / $ITE +RS; $n="$n"; my $vector; $start = time; for (1 .. $ITERS ) { $vector = Bit::Vector->new_Hex(32, $n); stuff($vector->Chunk_Read(14, 18), $vector->Chunk_Read( 6, 12), $vector->Chunk_Read( 6, 6), $vector->Chunk_Read( 6, 0) ); } printf "Bit::Vector took: %.12f seconds\n", ( time() - $start ) / $ITE +RS; __END__ C:\test>perl bv.pl Shift&and took: 0.000000551174 seconds Lookup took: 0.000000272709 seconds Lookup2 took: 0.000000335823 seconds (un)pack took: 0.000003393702 seconds Bit::Vector took: 0.000005372898 seconds
      The Bit::Vector code needs new_Dec, not new_Hex. Here are are some additions to your code showing Data::BitStream::XS and two alternatives using Bit::Vector. Neither are improvements over the fast simple methods, but they do show that both are at least reasonable solutions.
      $start = time; my $stream = Data::BitStream::XS->new; # $stream->write(32, $n) for 1 .. $ITERS; # Below is faster $stream->put_raw(pack("N",$n) x $ITERS, 32*$ITERS); $stream->rewind_for_read; for (1 .. $ITERS ) { stuff($stream->read(14), $stream->read(6), $stream->read(6), $stream- +>read(6)); } printf " D::B::XS took: %.12f seconds\n", ( time() - $start ) / $ITE +RS; $start = time; for (1 .. $ITERS ) { my $vector = Bit::Vector->new_Dec(32, $n); stuff($vector->Chunk_Read(14, 18), $vector->Chunk_Read( 6, 12), $vector->Chunk_Read( 6, 6), $vector->Chunk_Read( 6, 0) ); } printf "Bit::Vector took: %.12f seconds\n", ( time() - $start ) / $ITE +RS; $start = time; my $vector = Bit::Vector->new(32*$ITERS); $vector->Word_List_Store(($n) x $ITERS); for (0 .. $ITERS-1 ) { stuff($vector->Chunk_Read(14, 32*$_+18), $vector->Chunk_Read( 6, 32*$_+12), $vector->Chunk_Read( 6, 32*$_+6), $vector->Chunk_Read( 6, 32*$_+0) ); } printf "Bit::Vector took: %.12f seconds\n", ( time() - $start ) / $ITE +RS;
      I get on my computer:
      Shift&and took: 0.000000193217 seconds Lookup took: 0.000000180829 seconds Lookup2 took: 0.000000215928 seconds (un)pack took: 0.000001851533 seconds D::B::XS took: 0.000000643722 seconds Bit::Vector took: 0.000001739788 seconds Bit::Vector took: 0.000000957755 seconds
      It really depends on what the usage model is. Both Data::BitStream and Bit::Vector are going to suffer if you insist on making a new object for each word. They will do much better if you let them create a single object and fill it, then pull the data out in arbitrary sized chunks. If the OP had all the values in one chunk or array then this would be more appropriate.

      One could also write a new coding role for Data::BitStream to handle retrieving the values using code similar to one of the lookup methods, but there would have to be some real reason to want a stream vs. just doing it using one of the lookup methods.

Re^3: Efficient bit-twiddling in Perl.
by salva (Canon) on Mar 01, 2013 at 09:43 UTC
    Your benchmark is not very realistic. As $n doesn't vary, you are completely ignoring the effect of the CPU cache misses that the lookup table may introduce.

    If you use a random $n, you will see that the table approach becomes actually quite slower than the simple one.

    I have tried encoding the table in other ways, but have not been able to find any one good enough:

    #! perl -slw use strict; use Time::HiRes qw[ time ]; my @lookup; $#lookup = 0x3ffff; $lookup[ $_ ] = [ ( $_ & 0x3f000 ) >> 12, ( $_ & 0xfc0 ) >> 6, $_ & 0x +3f ] for 0 .. 0x3ffff; my( @nxt, @mid, @bot ); $nxt[ $_ ] = ( $_ & 0x3f000 ) >> 12, $mid[ $_ ] = ( $_ & 0xfc0 ) >> 6, $bot[ $_ ] = $_ & 0x3f for 0 .. 0x3ffff; my (@lookup3); $#lookup3 = 0x3ffff; $lookup3[$_ << 6] = [$_ >> 6, $_ & 0x3f] for 0 .. 0xfff; my $lookup4 = 'x' x (3 * (1<<18)); $lookup4 = ''; $lookup4 .= pack CCC => $_ >> 12, ($_>>6) & 0x3f, $_ & 0x3f for 0..0x3 +ffff; my $lookup6 = 'x' x (2 * (1<<12)); $lookup6 = ''; $lookup6 .= pack CC => $_ >> 6, $_ & 0x3f for 0..0xfff; print "tables generated"; our $ITERS //= 10e6; my @n = map int(rand(1<<18)), 1..$ITERS; print "sample data generated"; sub stuff{ # print "@_"; } my $start = time; for my $n (@n) { stuff( ( $n ) >> 18, ( $n & 0x0003f000 ) >> 12, ( $n & 0x00000fc0 ) >> 6, ( $n & 0x0000003f ) ); } printf "Shift&and took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, @{ $lookup[ $n & 0x3ffff ] } ); } printf " Lookup took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, @{ $lookup3[$n & 0x3ffc0] }, $n & 0x3f ); } printf " Lookup3 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, unpack CCC => substr($lookup4, 3 * ($n & 0x3ffff) +, 3)); } printf " Lookup4 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, unpack 'x'.(3 * ($n & 0x3ffff)).'CCC' => $lookup4 +); } printf " Lookup5 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, unpack(CC => substr($lookup6, ($n & 0x3ffc0) >> 5 +, 3)), $n & 0x3f); } printf " Lookup6 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, unpack('x'.(($n & 0x3ffc0) >> 5).'CC', $lookup6), + $n & 0x3f); } printf " Lookup7 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; __END__ Shift&and took: 0.000000783860 seconds Lookup took: 0.000001267049 seconds Lookup3 took: 0.000001018672 seconds Lookup4 took: 0.000001903985 seconds Lookup5 took: 0.000002110766 seconds Lookup6 took: 0.000001607903 seconds Lookup7 took: 0.000001791258 seconds
      Your benchmark is not very realistic. As $n doesn't vary, you are completely ignoring the effect of the CPU cache misses that the lookup table may introduce.

      The version of my benchmark I posted was the one I use -- in conjunction with -N=1 and uncommenting the print "@_"; -- to check that the output from all the methods is correct.

      For timing runs, I operate on the loop counter $_, which when -N=10e6, is enough to exercise the full range of the table 38 times.

      And I still get 20%+ improvement from the AoAs lookup:

      #! perl -slw use strict; use Time::HiRes qw[ time ]; my @lookup; $#lookup = 0x3ffff; $lookup[ $_ ] = [ ( $_ & 0x3f000 ) >> 12, ( $_ & 0xfc0 ) >> 6, $_ & 0x +3f ] for 0 .. 0x3ffff; my( @nxt, @mid, @bot ); $nxt[ $_ ] = ( $_ & 0x3f000 ) >> 12, $mid[ $_ ] = ( $_ & 0xfc0 ) >> 6, $bot[ $_ ] = $_ & 0x3f for 0 .. 0x3ffff; our $ITERS //= 10e6; sub stuff{ #print "@_"; } my $n = 0x80061861; my $start = time; for ( 1 .. $ITERS ) { stuff( ( $_ & 0xffc00000 ) >> 18, ( $_ & 0x0003f000 ) >> 12, ( $_ & 0x00000fc0 ) >> 6, ( $_ & 0x0000003f ) ); } printf "Shift&and took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for ( 1 .. $ITERS ) { stuff( $_ >> 18, @{ $lookup[ $_ & 0x3ffff ] } ); } printf " Lookup took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for ( 1 .. $ITERS ) { my $i = $_ & 0x3ffff; stuff( $_ >> 18, $nxt[ $i ], $mid[ $i ], $bot[ $i ] ); } printf " Lookup2 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for ( 1 .. $ITERS ) { stuff( unpack 'vccc', join'',map{ pack'b*', $_ } unpack 'a14a6a6a6 +', unpack 'B32', pack 'N', $_ ); } printf "(un)pack took: %.12f seconds\n", ( time() - $start ) / $ITERS +;

      Results

      C:\test>1021064 Shift&and took: 0.000000606833 seconds Lookup took: 0.000000485629 seconds Lookup2 took: 0.000000669586 seconds (un)pack took: 0.000007564075 seconds

      In general, I've found it very difficult to detect, much less measure any discernible effect from cpu caching in perl code on my machine. It shows up readily in tight loops in C code, but Perl's memory allocation is so cache unfriendly that it rarely seems to come into play.

      I would be interested to see the output from the above on your machine if you have a coupe of minutes.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        $ perl /tmp/1021241.pl Shift&and took: 0.000000844352 seconds Lookup took: 0.000000794679 seconds Lookup2 took: 0.000000945985 seconds (un)pack took: 0.000007257208 seconds
        tables generated sample data generated Shift&and took: 0.000000283640 seconds Lookup took: 0.000000553529 seconds Lookup3 took: 0.000000308301 seconds Lookup4 took: 0.000000435056 seconds Lookup5 took: 0.000000487484 seconds Lookup6 took: 0.000000454496 seconds Lookup7 took: 0.000000476669 seconds Linux 3.4.28-2.20-desktop [openSUSE 12.2 (Mantis)] x86_64 Xeon(R) CPU E5-1650 0 @ 3.20GHz/1200(12) x86_64 16009 Mb This is perl 5, version 16, subversion 2 (v5.16.2) built for x86_64-li +nux-ld

        I think that proves my point in Re^3: Efficient bit-twiddling in Perl.


        Enjoy, Have FUN! H.Merijn
        Shift&and took: 0.000000288148 seconds Lookup took: 0.000000252554 seconds Lookup2 took: 0.000000339390 seconds (un)pack took: 0.000002284377 seconds

        Enjoy, Have FUN! H.Merijn
Re^3: Efficient bit-twiddling in Perl.
by Tux (Canon) on Mar 01, 2013 at 07:24 UTC

    I think you can shave off a lot in XS (or Inline::C) when you switch to prebound variables, ala bind_columns () in DBI or Text::CSV_XS, as much of the XS overhead is stack-work. That said, it then is still is function calls, so the stack will be involved. If you pre-bind both in- AND out- parameters, you don't need to push variables on the stack and pop return values of the stack. That should speed up quite a bit.

    The next step could be looking in to op-hooks, so there are no real perl-level function calls anymore, but looking at what you already have, I'm wondering if it would be worth the effort (other than to learn from it),

    The lookup solutions have array-index ops in the back. I am stunned by the 20% gain you get, and wonder if that would differ per perl version and/or architecture.

    Anyway, the solution that is fastest on the machine the problem is solved on is most important, even if another method would be four times faster on z/OS with perl4 :)


    Enjoy, Have FUN! H.Merijn

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1021080]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2024-03-29 01:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found