http://qs321.pair.com?node_id=638715


in reply to Re^3: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) (addr math)
in thread Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)

I'd already coded up an XS version and used memchr for my search instead of your explicit loop. I don't know why but I found that using memchr got a function that was twice as fast. I included the searching part of the code below.

Rate avar avar2 ikegami_tr avar2_pos corion morit +z avar2_pos_inplace dio_c2 dio_c avar 109/s -- -6% -18% -40% -51% -70 +% -84% -91% -95% avar2 116/s 6% -- -13% -36% -48% -68 +% -83% -90% -95% ikegami_tr 133/s 22% 15% -- -27% -41% -64 +% -81% -89% -94% avar2_pos 182/s 66% 56% 36% -- -19% -50 +% -74% -84% -91% corion 224/s 105% 93% 68% 23% -- -39 +% -67% -81% -89% moritz 366/s 235% 215% 175% 101% 63% - +- -47% -69% -83% avar2_pos_inplace 686/s 527% 490% 415% 278% 206% 87 +% -- -41% -68% dio_c2 1172/s 971% 908% 779% 545% 422% 220 +% 71% -- -45% dio_c 2118/s 1836% 1722% 1488% 1066% 844% 479 +% 209% 81% -- dio_c2 // Do it while ( dpv < dpv_end ) { if ( ! *dpv ) *dpv = *spv; ++spv; ++dpv; } dio_c // Do it while ( 1 ) { ptr = (char*)memchr( ptr, '\0', ptr_end - ptr ); if ( ! ( ptr && ptr < ptr_end ) ) break; *ptr = *(ptr - dpv + spv); ++ ptr; }

⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

  • Comment on Re^4: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) (addr math)
  • Download Code

Replies are listed 'Best First'.
Re^5: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer) (addr math)
by tye (Sage) on Sep 13, 2007 at 01:29 UTC

    Yes, memchr() and related items usually end up implementing their loop in a single machine-language instruction on most processors. Given the relative infrequency of '\0' bytes, the added complexity of computation when one is found is outweighed by the much more efficient finding of them.

    - tye