You can eliminate the memcpy
by writing directly into the caller's memory area.
You should have the caller pass the size of the buffer.
The other bottleneck is division, but I don't see how to get around that.
Edit 1: correct initialization of buffer pointer
char * _i32toa(char *const rtn, unsigned int buff_size, int32_t i)
{
if (NULL == rtn) return NULL;
uint32_t ut, ui;
char minus_sign=0;
char *p = rtn + buff_size - 1;
// As before, without memcpy.
return rtn;
}