"The Python folks are quick to point out that you can always convert recursive functions to iterative ones and that they're always faster"
This is true, but if it's really as easy as all that, why doesn't Python do it for me, so that my code can look as simple as possible? (I say this not to slam Python implementers, but because the answer explains the problem).
Recursion optimisations have been present in functional languages since, like, the 14th century or something. Haskell, CAML, Lisp implementations all typically convert at least tail recursive functions to iterations: you basically do this by spotting that it's possible, i.e. that the function can be rearranged such that no local variables other than the return value are used after the recursive call. One trick to make it possible if there's some work done to the recursed return value before return, is to introduce an additional "accumulator" parameter. In simple terms this means the work can effectively be done on the way "down" instead of on the way "up": search around for 'how to make a function tail-recursive' for details.
The actual details of turning a tail recursive function into a loop is basically to jigger with your call convention such you can "perform the call" simply by setting up the parameters and jumping back to the start of the function, without bothering to save all that stuff in scope that you know you won't ever use. In assembler terms, you don't have to preserve caller-saves registers if data-flow analysis tells you they're unused beyond the call, and the same goes for anything on the stack: you don't have to move the stack pointer on a call if you don't mind "your" bit of stack being scribbled over by the next recursion/iteration.
Contrary to how you paraphrased the Python folks, converting a general recursive function to an iteration is not trivial: for example if it's multiply-recursive then in a simple approach you'd still need a stack.
Memoization is a useful technique, though, for arbitrarily recursive functions, that you might like to look up if you're interested in the possible approaches. What it means is that each time a function is evaluated, you stick the result in a cache. To use this to optimize recursion, basically, if your recursive function counts "down", and you memoize it, then you can evaluate it iteratively by adding a loop that counts "up" calculating each value of the function in turn until you reach the target. This uses very little stack space provided that the memo cache is big enough to hold all the values you'll need: for instance if f(n) depends on f(n-1), f(n-2) and f(n-3) you only need space for 3 values in the cache: as you go up you can kick the ladder away. If f(n) depends on f(n-1) and f(n/2), you need lots of space in the cache, but still less than you'd use for stack in an unoptimised recursion.