Just a quick note that I tested some form of eval-style test and was pleasantly surprised to find that no overloaded tied methods were actually invoked by the test (as I recall). So my preference is an eval-style test that doesn't run any user code. When such is indeed possible, then it wins (or ties) at both criteria. Unfortunately, I don't have the exact check code nor test handy and I won't be likely to dive into the details of that in the immediate future, it appears. But coming up with eval-style tests that won't emit warnings and demonstrating that they don't call user code (or how little user code they call) would be useful results (hopefully from this thread).
When such isn't possible, then I wouldn't want to use your preferred test unless it came rolled up in a core module such as overload (or, next best, Scalar::Util) so that the next time this stuff changes (or a problem is found) the test is likely to get updated. Of course, that still leaves the problem of using it in a backward-compatible way. For the optimists in the crowd, that reduces to just requiring the install of an updated overload module. But getting something a ton better than what Scalar::Util has managed to provide is long overdue and would certainly be welcomed by me. :)
Updated: to note that I was testing tied items, not overloaded.