Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)

by sundialsvc4 (Abbot)
on Apr 05, 2015 at 20:10 UTC ( [id://1122519]=note: print w/replies, xml ) Need Help??


in reply to Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)

I think that all such algorithms are rather-fundamentally inapplicable to your particular situation because they, by definition, are working at the smallest possible unit of storage:   the byte.   Whereas, today’s microprocessor architectures are both extremely wide (64 bits and counting), and extremely pipelined.   Data is grabbed into the processor in “cache lines” which are several times larger.   The architecture of the chip bears only a superficial resemblance to the way that it is described to even the assembly-language programmer.

I know that you don’t like to hear from me – that you emphatically don’t want to hear from me – that some folks think that I am talking to hear my head rattle and could not possibly have anything to contribute to your ongoing battle with “unaligned bit strings.”   But maybe, instead, you (folks) should be listening to what I have to say on this point.

You are, as I understand it, always looking for very long bitstrings, in even longer buffers, and always in a machine that has enough RAM to contain a single copy without paging.   Therefore, I suggest that you should strictly go for an algorithm, as I have suggested in the past, which tries to gulp the data in the largest chunks that the native-CPU can directly support, and that while doing so you avoid conditional logic.   You want the pipeline to become full and to remain full, with no branch-conditions to predict and no pipelines to break.   Please .. just try this:

  • There are exactly 64 possibilities for 64-bit words [2..n-1].   In a loop that considers all 64 “shifts” one at a time, search for any one without using if.
  • If there is a probability of lots of consecutive 64-bit words or much-less frequent ones, search for those instead.   You can search for a word in any offset in the buffer (other than the first or the last).
  • Confirm your suspicion either by sampling a few words or simply by testing the entire group of 64-bit words.   The logic is an easy combination of rol and shift and bitwise-ands, none of which requires conditional logic, all of which can be done in registers.   This, too, is important.
  • Only when words [2..n-1] all match do you finally test the first and the last 64-bit words, at the rotation specified, masking off the “don’t care” bits on either side.
  • What superficially appears to be a “brutish” approach, offered by a vainglorious Monk who likes to hear himself talk, actually is designed to play into the hands of the CPU.   The microprocessor’s physical implementation will be optimized even further, taking full advantage of whole cache-lines and more.   The key is that, instead of trying to find an unshifted bitstring in a buffer in which that string can occur in any of 64 alignments, you are searching for one of 64 shifted bitstrings ... and doing it with pipelined, un-conditional, raw speed.

    • Comment on Re: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)

Replies are listed 'Best First'.
Re^2: Why Boyer-Moore, Horspool, alpha-skip et.al don't work for bit strings. (And is there an alternative that does?)
by BrowserUk (Patriarch) on Apr 05, 2015 at 20:35 UTC
    But maybe, instead, you (folks) should be listening to what I have to say on this point.

    Why would I consider what you say this time; when it is the exact opposite of what you said last time.

    In both cases, your words show a complete misunderstanding of the problem; and offer completely meaningless description of completely unworkable approaches to a completely different problem.

    Someone recently described you as "The PerlMonks Village Idiot"; but that just unfair to village idiots, cos they can't help it.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1122519]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2024-04-19 00:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found