TT doesn't do anything with shared memory. It keeps the compiled templates in the local process, and shares their perl format on disk. I thought HTC did the same thing. Accessing shared memory from perl is usually pretty slow because of the need to serialize everything, so this type of setup ends up being faster for most cases. It wouldn't surprise me if TT used more memory than HTC, but I expect they aren't that far apart. A memory benchmark would certainly be interesting, but it's very hard to get accurate numbers that account for things like copy-on-write sharing.
Anyway, recreating the TT object on every request is a common mistake by new TT users. It has even more of an effect when people use many includes.