I'd already coded up an XS version and used memchr for my search instead of your explicit loop. I don't know why but I found that using memchr got a function that was twice as fast. I included the searching part of the code below.
Rate avar avar2 ikegami_tr avar2_pos corion morit
+z avar2_pos_inplace dio_c2 dio_c
avar 109/s -- -6% -18% -40% -51% -70
+% -84% -91% -95%
avar2 116/s 6% -- -13% -36% -48% -68
+% -83% -90% -95%
ikegami_tr 133/s 22% 15% -- -27% -41% -64
+% -81% -89% -94%
avar2_pos 182/s 66% 56% 36% -- -19% -50
+% -74% -84% -91%
corion 224/s 105% 93% 68% 23% -- -39
+% -67% -81% -89%
moritz 366/s 235% 215% 175% 101% 63% -
+- -47% -69% -83%
avar2_pos_inplace 686/s 527% 490% 415% 278% 206% 87
+% -- -41% -68%
dio_c2 1172/s 971% 908% 779% 545% 422% 220
+% 71% -- -45%
dio_c 2118/s 1836% 1722% 1488% 1066% 844% 479
+% 209% 81% --
dio_c2
// Do it
while ( dpv < dpv_end ) {
if ( ! *dpv )
*dpv = *spv;
++spv;
++dpv;
}
dio_c
// Do it
while ( 1 ) {
ptr = (char*)memchr( ptr, '\0', ptr_end - ptr );
if ( ! ( ptr && ptr < ptr_end ) )
break;
*ptr = *(ptr - dpv + spv);
++ ptr;
}
|