Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: [OT] The interesting problem of comparing bit-strings.

by BrowserUk (Patriarch)
on Mar 25, 2015 at 20:32 UTC ( [id://1121341]=note: print w/replies, xml ) Need Help??


in reply to Re: [OT] The interesting problem of comparing bit-strings.
in thread [OT] The interesting problem of comparing bit-strings.

“Bit vector” problems such as this one actually have many different analogs. (Graphics processing, in particular, comes to mind.)

Arrays of rgb or rgba (or CMK or HSA) are not bit vectors.

For instance, if you are routinely looking for bit-strings that are (say) only 20-30 bits long, then there really isn’t any alternative but to pedantically growl through the buffer.

There isn't any alternative. Full stop. Regardless of the length of the needles.

But when a sensible algorithm can locate 1 million bits close to the end of a 1/4 billion bits in 0.25 seconds, who cares if you have to "pedantically growl".

Strategies such as the one which I previously suggested would be of no real value,

It is of no value. Full stop.

Even if you try to press strategies such as Moore into service, you find that they are not nearly so effective in this situation

if by "Moore" you mean Boyer Moore; you're wrong again. It requires some modifications, but it can be very useful. Just not with microcoded scan loops.

Hence, the necessity to “divide and conquer.” To find some rune by which you can quickly eliminate all places where the data could not be.

There's no necessity. And anything that takes longer than 1/4 of a second is pointless.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
  • Comment on Re^2: [OT] The interesting problem of comparing bit-strings.

Replies are listed 'Best First'.
Re^3: [OT] The interesting problem of comparing bit-strings.
by syphilis (Archbishop) on Apr 04, 2015 at 01:05 UTC
    But when a sensible algorithm can locate 1 million bits close to the end of a 1/4 billion bits in 0.25 seconds, who cares if you have to "pedantically growl" ..... And anything that takes longer than 1/4 of a second is pointless.

    Out of curiosity, I tested the gmp library's capability with the above scenario - under the assumption that "1/4 billion" is "250 million".
    The haystack was pseudo-randomly generated, and the needle starts 19 bits from the left hand end of the haystack.
    I used a combination of Math::GMPz and Inline::C.

    It took 3 seconds to find the needle in the haystack (starting from the right hand end of the haystack) on Windows 7 64 bit.

    In some parts of the world "1/4 billion" is "25 million" and the program then, of course, finds the needle 10 times quicker.

    I'm not at all impressed with the code that I wrote in order to check this. But I was impressed that a general usage library could provide such a speedy implementation of the approach.
    Unfortunately, it's not quick enough to be pointful ;-)
    use strict; use warnings; use Math::GMPz qw(:mpz); use Time::HiRes; use Inline C => Config => USING => 'ParseRegExp', BUILD_NOISY => 1, INC => '-IC:/MinGW/msys/1.0/local/include', LIBS => '-LC:/MinGW/msys/1.0/local/lib -lgmp', TYPEMAPS => './typemap', # ships with Math::GMPz ; use Inline C => <<'EOC'; #include <gmp.h> int start(mpz_t * haystack, int haystack_size, mpz_t * needle, int needle_size) { int h_bit, n_bit, nless1 = needle_size -1; for(h_bit = nless1; h_bit < haystack_size; h_bit++) { for(n_bit = 0; n_bit < needle_size; n_bit++) { if(mpz_tstbit(*haystack, h_bit - needle_size + n_bit) != mpz_tstbit(needle, n_bit)) break; if(n_bit == nless1) return h_bit; } } return 0; } EOC my ($t1, $t2, $match); my ($haystack_size, $needle_size) = (250e6, 1e6); # generate the haystack - takes a while my $h_string = rstring($haystack_size); # generate a needle that's guaranteed to match , starting # at 19 bits from the lh end - ie at bit 249999981 my $n_string = substr($h_string, 19, $needle_size); # create the haystack gmp vector. (We need an additional # leading "1" bit to ensure that leading zero bits aren't ignored.) my $haystack = Math::GMPz->new( "1" . $h_string , 2); # create the needle gmp vector. (We need an additional # leading "1" bit to ensure that leading zero bits aren't ignored.) my $needle = Math::GMPz->new("1" . $n_string, 2); # Check that we've got the right number of bits. print Rmpz_sizeinbase($haystack, 2), " ", Rmpz_sizeinbase($needle, 2), "\n"; $t1 = Time::HiRes::time(); $match = start($haystack, $haystack_size, $needle, $needle_size); $t2 = Time::HiRes::time(); print $match, " ", $t2 - $t1, "\n"; sub rstring { die "haystack size must be a multiple of 10" if $_[0] % 10; my $ret; my $its = $_[0] / 10; $ret .= sprintf("%010b",int rand 1024) for 1 .. $its; if(length($ret) != $_[0] || $ret =~ /[^01]/) { die "Error in sub rstring"; } return $ret; } # Outputs: # 250000001 1000001 # 249999981 3.16680598258972
    Cheers,
    Rob
      under the assumption that "1/4 billion" is "250 million".

      You picked out a quote from a reply to a "non-combatant"; someone for whom the differences between 'GB', 'Gb' and 'GiB' would be lost.

      For the sake of precision, "1/4 billion bits", in this case is, 1024**3/4 == 268435456 bits == 33554432 bytes == 32 MiB.

      BTW: searching from the end is an intriguing possibility...but gains play losses if the needle is to be found at the beginning (left-most; least-most; lower) position in the haystack.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1121341]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (1)
As of 2024-04-25 00:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found