Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^3: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) (addr math)

by tye (Sage)
on Sep 12, 2007 at 17:39 UTC ( [id://638624]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)
in thread Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)

You are calculating array offsets over and over. The fast way to do this in C is more like:

int len; char* pBeg= SvPV(...,len); char* pSrc= pBeg + len; char* pDst= SvPV(....) + len; while( pBeg <= pSrc ) { if( ! *pDst ) { *pDst= *pSrc; } --pSrc; --pDst; }

- tye        

  • Comment on Re^3: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) (addr math)
  • Download Code

Replies are listed 'Best First'.
Re^4: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) (addr math)
by diotalevi (Canon) on Sep 13, 2007 at 01:24 UTC

    I'd already coded up an XS version and used memchr for my search instead of your explicit loop. I don't know why but I found that using memchr got a function that was twice as fast. I included the searching part of the code below.

    Rate avar avar2 ikegami_tr avar2_pos corion morit +z avar2_pos_inplace dio_c2 dio_c avar 109/s -- -6% -18% -40% -51% -70 +% -84% -91% -95% avar2 116/s 6% -- -13% -36% -48% -68 +% -83% -90% -95% ikegami_tr 133/s 22% 15% -- -27% -41% -64 +% -81% -89% -94% avar2_pos 182/s 66% 56% 36% -- -19% -50 +% -74% -84% -91% corion 224/s 105% 93% 68% 23% -- -39 +% -67% -81% -89% moritz 366/s 235% 215% 175% 101% 63% - +- -47% -69% -83% avar2_pos_inplace 686/s 527% 490% 415% 278% 206% 87 +% -- -41% -68% dio_c2 1172/s 971% 908% 779% 545% 422% 220 +% 71% -- -45% dio_c 2118/s 1836% 1722% 1488% 1066% 844% 479 +% 209% 81% -- dio_c2 // Do it while ( dpv < dpv_end ) { if ( ! *dpv ) *dpv = *spv; ++spv; ++dpv; } dio_c // Do it while ( 1 ) { ptr = (char*)memchr( ptr, '\0', ptr_end - ptr ); if ( ! ( ptr && ptr < ptr_end ) ) break; *ptr = *(ptr - dpv + spv); ++ ptr; }

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      Yes, memchr() and related items usually end up implementing their loop in a single machine-language instruction on most processors. Given the relative infrequency of '\0' bytes, the added complexity of computation when one is found is outweighed by the much more efficient finding of them.

      - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://638624]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2024-04-20 01:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found