This seems to be much slower than AGM. 10k digits takes 0.13s but 100k digits takes 33.8s. Even the Machin-type formula are faster.
The paper Exploration in π Calculation Using Various Methods Implemented in Python has some insight, and is useful to read in general as well. We certainly shouldn't be doing all those factorials in the loop. Those are pretty easy to pull out. But the killer is the huge-precision divide done for every loop iteration. Binary splitting is what's needed, though it is a non-trivial change. You can see in their Table 2 just how significant the difference is. If I recall correctly, fast implementations use fine control of the precision on every loop iteration, where in contrast AGM has to use full precision for each iteration.