Not surprisingly, the version which just deletes keys from the source hash is by far the fastest.
No surprise here, because keys were deleted only once during single test run. In all the other 1e6 runs, the "else" branch was never reached. I'm not suggesting the benchmarks could be fixed, e.g. by deep-cloning the $t for each run for every contestant -- because I don't see the point, frankly. The OP uses word "remove" twice, but apart from this hash_filter_delete in benchmarks, nothing was removed in the whole thread, but filtered shallow copies were constructed. Was that the intention? Some inconsistency, at least, in that strings are duplicated, but arrays cloned. The useful payload so tiny, compared to overhead of e.g. maintaining queue (pushing, shifting, anon arrays creation), and recursion depth so shallow, that iterative approach will show no advantages.