Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Detecting whether UV fits into an NV

by roboticus (Chancellor)
on Feb 27, 2020 at 14:31 UTC ( [id://11113490]=note: print w/replies, xml ) Need Help??


in reply to Detecting whether UV fits into an NV

syphilis:

I poked around with it and came up with something for you to figure it out without the looping.

$ cat funky_bits.cc // funky_bits.cpp // // experiment with odd bit handling? // // 20200227 #include <stdio.h> // Some test cases unsigned long long foo[] = { 0xcab312, 0xdeadbeef, 0xdeadbeef0, 0xdeadbeef000, 0x138, 9007199254740992ULL, 18446744073709551615ULL }; // Needs a good name, and the type shenanigans should be cleaned up bool duzzit_fit(unsigned long long t) { // First we need to identify the lowest bit set long long neg_t = -(long long)t; long long last_set = t & neg_t; // Shift it left 52 bits to get location of lowest invalid bit // NOTE: If smallest invalid bit is far enough left, then this wil +l turn into 0 // making the invalid bits also 0, which happens to be OK! long long smallest_invalid = last_set << 52; // Convert it to a bit mask for the smallest invalid bit and those + to the left. unsigned long long valid_bits = smallest_invalid - 1; unsigned long long invalid_bits = ~valid_bits; // Uncomment for debugging //printf( " t:%16llx\n" // " last_set:%16llx\n" // " neg_t:%16llx\n" // "smallest_invalid:%16llx\n" // " valid_bits:%16llx\n" // " invalid_bits:%16llx\n", // t, last_set, neg_t, smallest_invalid, // valid_bits, invalid_bits); // It's a 'valid' number unless one of the upper-order invalid bit +s are set return ! (t & invalid_bits); } int main(int, char **) { for (int z=0; z<sizeof(foo)/sizeof(foo[0]); ++z) { unsigned long long i = foo[z]; bool b = duzzit_fit(i); printf("%lld: %s\n", i, b ? "OK" : "**NOPE**"); } return 0; } $ clang -o funky_bits funky_bits.cc $ ./funky_bits 13284114: OK 3735928559: OK 59774856944: OK 15302363377664: OK 312: OK 9007199254740992: OK -1: **NOPE**

Cleaning it up, adding more test cases, further testing, debugging and simplification are all left as exercises for the reader... ;^D

I was originally thinking of making a lookup table, but thought of this along the way.

Update: fixed the abomination that was the first sentence.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Replies are listed 'Best First'.
Re^2: Detecting whether UV fits into an NV
by syphilis (Archbishop) on Mar 04, 2020 at 07:48 UTC
    I was originally thinking of making a lookup table, but thought of this along the way.

    Thanks for that - I still haven't worked my way through it properly.
    It seems to work fine now that I've changed:
    long long smallest_invalid = last_set << 52; to long long smallest_invalid = last_set << 53;
    Without that change, the following values (for example) wrongly reported "NOPE":
    1844674407366955264, 1844674407366955776, 1844674407366956288, 1844674407366956800, 1844674407366957312, 1844674407366957824, 1844674407366958336, 1844674407366958848, 1844674407366959360, 1844674407366959872, 1844674407366960384, 1844674407366960896, 1844674407366961408, 1844674407366961920, 1844674407366962432, 1844674407366962944, 1844674407366963456, 1844674407366963968, 1844674407366964480, 1844674407366964992
    In the following XS adaptation it's taking over twice as long as uv_fits_double3.
    use warnings; use Benchmark; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<'EOC'; int uv_fits_double3(SV * x, ...) { dXSARGS; int i, count = 0; UV arg; for(i = 0; i < items; i++) { arg = SvUV(ST(i)); if(arg) { while(!(arg & 1)) arg >>= 1; if(arg < 9007199254740993) count++; } } return count; } int uv_fits_double_bitfiddle(SV * t, ...) { dXSARGS; int i, count = 0; for(i = 0; i < items; i++) { /* First we need to identify the lowest bit set */ IV neg_t = -(SvIV(ST(i))); IV last_set = SvUV(ST(i)) & neg_t; /* Shift it left 53 bits to get location of lowest invalid bit + * * NOTE: If smallest invalid bit is far enough left, then this wil +l * * turn into 0 making the invalid bits also 0, which happens to be + OK! */ IV smallest_invalid = last_set << 53; UV valid_bits = smallest_invalid - 1; UV invalid_bits = ~valid_bits; /* It's a 'valid' number unless one of the upper-order invalid bit +s are set */ if(!(SvUV(ST(i)) & invalid_bits)) count++; } return count; } EOC @in = (1844674407366955161 .. 1844674407378955161); # @in = (9007199248740992 .. 9007199260740992); # @in = (184467436737095 .. 184467448737095); # @in = (184463440737 .. 184475440737); die "wrong size" unless @in == 12000001; timethese (1, { 'uv_fits_double3' => '$count1 = uv_fits_double3(@in);', 'uv_fits_double_bitfiddle' => '$count2 = uv_fits_double_bitfiddle(@in) +;', }); print "$count1 $count2\n"; die "Error in at least one of the XSubs" unless $count1 == $count2; __END__
    Benchmarking results on my Windows 7 machine with perl-5.30.0 are:
    Benchmark: timing 1 iterations of uv_fits_double3, uv_fits_double_bitf +iddle... uv_fits_double3: 0 wallclock secs ( 0.44 usr + 0.00 sys = 0.44 CPU) + @ 2.29/s (n=1) (warning: too few iterations for a reliable count) uv_fits_double_bitfiddle: 1 wallclock secs ( 0.94 usr + 0.00 sys = +0.94 CPU) @ 1.07/s (n=1) (warning: too few iterations for a reliable count) 12000001 12000001
    Can you see room for significant improvement in performance of uv_fits_double_bitfiddle ?

    Thanks for another interesting approach !!

    Cheers,
    Rob

      Rob:

      I'm not experienced in using Inline::C, but I thought I'd take a look at it. My first guess is that the overhead in SvUV(ST(i)) is causing twiddle to be slower, so I tweaked the routine to explicitly pull arg into a UV as you did in double3:

      $ diff pm_11113750_orig.pl pm_11113750_a.pl 26a27 > UV arg; 30,31c31,33 < IV neg_t = -(SvIV(ST(i))); < IV last_set = SvUV(ST(i)) & neg_t; --- > arg = SvUV(ST(i)); > IV neg_t = -arg; > IV last_set = arg & neg_t; 42c44 < if(!(SvUV(ST(i)) & invalid_bits)) count++; --- > if(!(arg & invalid_bits)) count++;

      Doing that change alone made the routines roughly the same speed:

      $ PERL5OPT= perl pm_11113750_orig.pl Benchmark: timing 5 iterations of uv_fits_double3, uv_fits_double_bitf +iddle... uv_fits_double3: 7 wallclock secs ( 6.70 usr + 0.00 sys = 6.70 CPU) + @ 0.75/s (n=5) uv_fits_double_bitfiddle: 17 wallclock secs (17.36 usr + 0.00 sys = 1 +7.36 CPU) @ 0.29/s (n=5) 46875 46875 $ PERL5OPT= perl pm_11113750_a.pl Benchmark: timing 5 iterations of uv_fits_double3, uv_fits_double_bitf +iddle... uv_fits_double3: 7 wallclock secs ( 6.75 usr + 0.00 sys = 6.75 CPU) + @ 0.74/s (n=5) uv_fits_double_bitfiddle: 6 wallclock secs ( 6.67 usr + 0.00 sys = +6.67 CPU) @ 0.75/s (n=5) 46875 46875

      Since the twiddle part doesn't have any loops, I would expect it to be significantly faster than a version *with* loops, depending on how clever the optimizer is. So another version I played with was to modify the routine to take two parameters and count the number of values that fit between those two values. That way, I could greatly reduce the impact of the SvUV() function at the cost of only being able to process sequences of numbers. That version confirmed my suspicion that the SvUV() function is slow enough that it swamps the difference in our algorithms.

      int uv_fits_double3x(SV* the_min, SV* the_max) { dXSARGS; UV i_min = SvUV(the_min); UV i_max = SvUV(the_max); UV i; int count = 0; UV arg; for(i = i_min; i < i_max; i++) { arg = i; if(arg) { while(!(arg & 1)) arg >>= 1; if(arg < 9007199254740993) count++; } } return count; } int uv_fits_double_bitfiddle_3x(SV* the_min, SV* the_max) { dXSARGS; UV i_min = SvUV(the_min); UV i_max = SvUV(the_max); UV i; int count = 0; for(i = i_min; i < i_max; i++) { UV arg = i; IV neg = -arg; IV smallest_invalid = (arg & -arg)<<53; UV valid_bits = smallest_invalid-1; UV invalid_bits = ~valid_bits; if (! (arg & invalid_bits)) count++; } return count; }

      Both routines are much faster if they only take two arguments and iterate over the entire range. Only then does the twiddle version show a significant speed advantage at about twice as fast. The double3x version is about 1800 times faster, while fiddle3x is about 2900 times faster:

      $ PERL5OPT= perl pm_11113750_x.pl Name "main::count4" used only once: possible typo at pm_11113750_x.pl +line 147. === Test 1: roughly equivalent speed Benchmark: timing 5 iterations of uv_fits_double3, uv_fits_double_bitf +iddle... uv_fits_double3: 28 wallclock secs (27.22 usr + 0.08 sys = 27.30 CPU) + @ 0.18/s (n=5) uv_fits_double_bitfiddle: 27 wallclock secs (26.72 usr + 0.00 sys = 2 +6.72 CPU) @ 0.19/s (n=5) 33046878 33046878 (48000004) === TEST 2: twiddle has an advantage if we don't use SvUV() Benchmark: timing 10000 iterations of bitfiddle_3x, double3x, empty_fu +nc, empty_loop... bitfiddle_3x: 18 wallclock secs (18.39 usr + 0.00 sys = 18.39 CPU) @ +543.74/s (n=10000) double3x: 30 wallclock secs (29.55 usr + 0.00 sys = 29.55 CPU) @ 33 +8.44/s (n=10000) empty_func: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) (warning: too few iterations for a reliable count) empty_loop: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) (warning: too few iterations for a reliable count) 33046875 33046875 48000000 1416858175

      Unfortunately, I've not been successful in making a useful version that's faster in a real-world scenario. My current understanding of things (after playing with it for a few hours) is that the single argument versions get swamped out by the looping and function calling overhead. The functions are just too simple to be aHowever, I've not been successful in making a useful version that's significantly faster (with call overhead and such).

      NOTES: As you can tell from the above, I've not been successful in measuring the function call (empty func) or empty loop versions of the functions, so my benchmarking could be broken, so I'm including the current code, so people can point things out.

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

        My first guess is that the overhead in SvUV(ST(i)) is causing twiddle to be slower

        Yes, I thought of replacing them with a variable, but decided there wouldn't be that much difference between looking at the value of an SV's IV slot and looking at the value of an IV.
        I guess for a few calls there's not much difference, but when you're making 36 million of them it's not hard to believe that things might add up - and I should have thought that through a little better. (Actually, a "lot better".)

        Fixing that alone makes uv_fits_double_bitfiddle almost twice as fast as uv_fits_double3 for me:
        Benchmark: timing 1 iterations of uv_fits_double3, uv_fits_double_bitf +iddle... uv_fits_double3: 1 wallclock secs ( 0.45 usr + 0.00 sys = 0.45 CPU) + @ 2.21/s (n=1) (warning: too few iterations for a reliable count) uv_fits_double_bitfiddle: 0 wallclock secs ( 0.25 usr + 0.00 sys = +0.25 CPU)@ 4.02/s (n=1) (warning: too few iterations for a reliable count)
        This is pretty much the type of approach whose existence I had wondered about.
        It had never been pointed out to me that iv & -iv would identify the least significant set bit, and I'm certainly not sharp enough to have ever realized it myself.
        This method is just brilliant ... and it's great that it turns out to be faster, too !!
        I'll certainly be using it (with due accreditation to you) unless further testing, contrary to my expectations, reveals some problem with it.

        Cheers,
        Rob

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11113490]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2024-04-25 06:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found