Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^6: [OT] The interesting problem of comparing bit-strings. (More info.)

by BrowserUk (Patriarch)
on Mar 25, 2015 at 13:34 UTC ( [id://1121302]=note: print w/replies, xml ) Need Help??


in reply to Re^5: [OT] The interesting problem of comparing bit-strings. (More info.)
in thread [OT] The interesting problem of comparing bit-strings.

Might be that I'm totally wrong, but I think that as long as you're trying to consider 1. Incrementing haystack by 1-bit. an important primary op, there's something wrong.

Sorry. I should have pointed out that Incrementing haystack by 1-bit isn't quite what it sounds like.

I used the expression to match up to the strstr() implementation I posted.

When in that algorithm they increment the haystack pointer (char*), so as to compare the needle at the next byte offset, I need to effectively increment the haystack by 1 bit.

What that actually entails is a single machine code instruction that shifts 1 64-bit register left by one bit whilst bringing in a new bit from another 64-bit register containing the (preloaded) next (to the right) quad from the haystack.

That's the __shiftleft128() intrinsic which is implemented as a single shld instruction.

But, I also need to detect when I transit a 64-bit boundary so that I can preload the next quad, and reset the shift counter. The whole thing looks like this (very close to what I had above):

U64 nextBit( U64 **p, U8 *o ) { // p: pointer to poin +ter to current quad; o: pointer to current offset if( ++( *o ) == 64 ) { // increment the offs +et and detect transitions across quad boundaries. *o = 0; // reset the offset. ++*p; // increment the quad + pointer. } // Get the 'current' (unshifted) quad value; and the next return __shiftleft128( *( *p + 1 ), **p, *o ); // return t +he value from the higher location with *o bits from lo shifted in fro +m the right. }

Which, despite that it looks like it does two 64-bit loads from memory every time, once inlined and optimised, it only does one 64-bit load every 64 calls, with the compiler seeing fit to keep the values in registers across calls. In fact, even the shift-left gets optimised away/folded into the same shift left that is used for the haystack in the inner loop.

It's hard to follow (if your unfamiliar with x64 code), but the shld instruction that should appear at the bottom of the outer loop (just above the label $LN14@bvSearch:, has been folded into the shld that appears near the top of the inner loop, due to the C code h = nextQuad( &hp, oHay );, by virtue of the rather complex addressing mode used for the register loads:

mov rax, QWORD PTR [r11+r9+16] mov r8, QWORD PTR [r11+r9+8]

The relevant part of the optimised assembler output:

; 41 : while( hay < eHay ) { // until we +reach the end of the haystack cmp rcx, r12 jae SHORT $LN5@bvSearch mov r11, rcx sub r11, r9 npad 3 $LL6@bvSearch: ; 42 : hp = hay; // copy of t +he haystack pointer ; 43 : np = ndl; // copy of t +he needle pointer mov r9, r13 $LL4@bvSearch: ; 44 : //TAG; ; 45 : do { ; 46 : // end of needle; match completed; return current + value of haystack pointer ; 47 : if( np >= eNdl ) { cmp r9, rdi jae SHORT $LN19@bvSearch ; 55 : } ; 56 : h = nextQuad( &hp, oHay ); // get the n +ext quad from haystack mov rax, QWORD PTR [r11+r9+16] mov r8, QWORD PTR [r11+r9+8] ; 57 : n = nextQuad( &np, oNdl ); // get the n +ext quad from needle mov rdx, QWORD PTR [r9+8] add r9, 8 movzx ecx, sil shld r8, rax, cl mov rax, QWORD PTR [r9+8] movzx ecx, bpl shld rdx, rax, cl ; 58 : //printf( "\rhp:%p->h:%16I64x[%s] np:%p->n:%16I64x[%s]\t", hp +, h, toBin( h ), np, n, toBin( n ) ); ; 59 : } while( h == n ); // while the + quads match cmp r8, rdx je SHORT $LL4@bvSearch ; 60 : nextBit( &hay, &oHayCopy ); // Got here +means a mismatch; get the next bit from the haystack inc bl cmp bl, 64 ; 00000040H jne SHORT $LN14@bvSearch xor bl, bl add r10, 8 add r11, 8 ; 41 : while( hay < eHay ) { // until we +reach the end of the haystack cmp rcx, r12 jae SHORT $LN5@bvSearch mov r11, rcx sub r11, r9 npad 3 $LL6@bvSearch: ; 42 : hp = hay; // copy of t +he haystack pointer ; 43 : np = ndl; // copy of t +he needle pointer mov r9, r13 $LL4@bvSearch: ; 44 : //TAG; ; 45 : do { ; 46 : // end of needle; match completed; return current + value of haystack pointer ; 47 : if( np >= eNdl ) { cmp r9, rdi jae SHORT $LN19@bvSearch ; 55 : } ; 56 : h = nextQuad( &hp, oHay ); // get the n +ext quad from haystack mov rax, QWORD PTR [r11+r9+16] mov r8, QWORD PTR [r11+r9+8] ; 57 : n = nextQuad( &np, oNdl ); // get the n +ext quad from needle mov rdx, QWORD PTR [r9+8] add r9, 8 movzx ecx, sil shld r8, rax, cl mov rax, QWORD PTR [r9+8] movzx ecx, bpl shld rdx, rax, cl ; 58 : //printf( "\rhp:%p->h:%16I64x[%s] np:%p->n:%16I64x[%s]\t", hp +, h, toBin( h ), np, n, toBin( n ) ); ; 59 : } while( h == n ); // while the + quads match cmp r8, rdx je SHORT $LL4@bvSearch ; 60 : nextBit( &hay, &oHayCopy ); // Got here +means a mismatch; get the next bit from the haystack inc bl cmp bl, 64 ; 00000040H jne SHORT $LN14@bvSearch xor bl, bl add r10, 8 add r11, 8 $LN14@bvSearch: ; 40 : ; 41 : while( hay < eHay ) { // until we +reach the end of the haystack cmp r10, r12 jb SHORT $LL6@bvSearch $LN5@bvSearch: ; 61 : }

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
codeb

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1121302]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2024-04-20 09:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found