I feel your pain. The test suite for Handel has xml/tt output tests for
its AxKit and Template Toolkit plugins. I've got oodles of template
pages using the components whos output I compare to static .out files
that contain the expected output. Right now, that's about 2000+ tests, with about 200 of those being the actual page output tests using all of the possible tag variations.
Everytime I write a new plugin, or a new tag in the plugin, I waste tons
of time just writing the tests for them. So far, I've been good about
writing the tests before I write the code, but it takes forever and I
rarely get the tests right the first time.
I'm curious to see what comes out of your question. I'm in the same boat.