Usage scenario. Most of the time, the producer will be running on one core and the consumer on another, and they will producing and consuming from their respective ends of the shared memory structure as fast as they can go. No locking; no synching; no (elective) context switching.
Occasionally, one end or the other will get preempted for some higher priority thread. At this point, the shared data structure will become either full, or empty depending upon which end is still running. At that point, that end needs to enter a wait state until the other end gets another timeslice, does its thing, relieving the empty or full state and waking up the other end to continue.
Most of the time, given a correctly sized, and well-written buffering data-structure, the above scenario is both lock-free, wait free and requires no system calls (ring3/ring0/ring3 transitions). Both consumer and producer threads are free to run as fast as their processing requirements allow them and utilise their full timeslices. The latter point is the key to maximum utilisation.
If I use suspend/resume, buffer empty/full conditions are guaranteed to not only require a multiple calls into the kernel, but also (at least one) very expensive context switch. If I use cond_vars and (unneeded) kernel mutexs, this also means an expensive call into the kernel for every read & write.
The whole point of lock-free & wait-free algorithms is that they avoid both: expensive calls into the kernel; and expensive elective context switches--ie. non-pre-emptive ceding of the cpu--in order to make full use of each time-slice allotted.
The point of Fast, user-space mutexes is that they run in user-space, and are therefore faster.
The (lock-free/wait-free) algorithms are getting better and better defined. The hardware support (CAS, XCHG and similar SMP atomic instructions) is getting better and better with every new generation of processors.
The limitations are currently locking, syncing and signalling mechanisms designed for single-processor/core IPC purposes. Given that much of the HPC research is done on *nix boxes of one flavour or another, I know there are better mechanisms out there. This thread was meant to be about enlisting help to find them, not argue about whether they are possible, or even required.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.