http://qs321.pair.com?node_id=638738


in reply to Re^2: Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)
in thread Challenge: CPU-optimized byte-wise or-equals (for a meter of beer)

I just tested this snippet on gcc 4.2.1 and it had the same speed on a two-processor, multi-core machine as it did without the OpenMP annotations as well as when it was compiled by gcc 3.whatever.

#pragma omp parallel for for( ; dsvc < dsvc_end; ++dsvc, ++ssvc ) { if ( ! *dsvc ) *dsvc = *ssvc; }

Just to see if things changed if I simplified the for(;;) loop, this also had no effect. Or perhaps the openmp version got slower. It's been about a 1000 calls slower the last few times I tried it. I'm sure that's just because the loop has to do the pointer math all the time instead of just incrementing like the previous version. OpenMP just isn't getting enabled, I think.

/* Do it. */ #pragma omp parallel for for( o = 0; o < len; ++o ) { if ( ! *( dsvc + o ) ) *(dsvc+o) = *(ssvc+o); }

In the below chart I deleted a pile of uninteresting programs and kept only the ones I was testing and things that performed at least as well as my worst candidate.

Rate moritz_inplace dio_openmp swap dio_noopenmp + avar_tye_c_inplace avar_c_inplace dio_c moritz_inplace 1948/s -- -64% -66% -68% + -69% -73% -83% dio_openmp 5344/s 174% -- -7% -13% + -15% -26% -54% swap 5745/s 195% 8% -- -6% + -9% -20% -50% dio_noopenmp 6124/s 214% 15% 7% -- + -3% -15% -47% avar_tye_c_inplace 6286/s 223% 18% 9% 3% + -- -12% -45% avar_c_inplace 7178/s 269% 34% 25% 17% + 14% -- -38% dio_c 11495/s 490% 115% 100% 88% + 83% 60% --

⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊