Using a mod-6 wheel is quite common. ntheory (aka Math::Prime::Util) uses a mod-30 wheel which is nice because their are 8 candidates per block, which packs nicely in a byte. The Sieve of Atkin is a twist on mod-30 (among other things, it's mod-60 which lets it skip 2,3,5). Terje Mathisen wrote a neat little monolithic mod-30 sieve in 1998 you can find on the net. I've seen a mod-2310 sieve but it seems very unwieldy for very little gain. PrimeSieve has 3 sieves for use at different sizes. A mod-30 sieve for small sizes, a mod-210 wheel for medium, and an optimized version of Oliveira e Silva's sieve for large sizes.
Quesada and Van Pelt wrote about a mod-30 sieve in 1996 (they cite Luo's 1989 paper), Sorenson in 1991. He even has a section discussing their agreements and disagreements with Luo's paper as well a Pritchard's earlier paper. Sorenson has a paper on Arxiv from March 2015 that I've been meaning to look at, though I don't think it will change anything (e.g. "The proof is a bit artificial and of theoretical interest only. Coding this in
practice seems daunting."). People are still working on these things.
There is no question that primesieve is fast. It's great stuff, and Kim is a sharp developer who has been working on this for many years. He also has a slightly different target -- at large sizes the TOeS algorithm uses a lot of memory. I have tried to keep memory use more constrained. This comes up a lot with tables and precalculation, where I really can't do too much, as most users will probably never even call whatever function I'm optimizing, and I get a lot of feedback about trying to keep the initial size down. primesieve, and its sister project primecount, strive to be fast and offer C and C++ APIs.