I am trying to implement automatic voice recording functionality, similar to the Talking Tom app. I use the following code to read input from the audio recorder and analyse the
I tried to solve a similar problem on Windows. One thing I learned fast -- simple frequency analysis with a fast Fourier transform is not enough. Lots of noises hit human frequencies -- from simple taps on the microphone to clapping hands. Even some level of sophisticated filtering won't do it. I've found the easiest way is to take the noise to a cloud API and ask it to transcribe the speech. If the cloud API can transcribe to a reasonable length string, then I can continue recording -- else, stop recording. This does require that you sample some noise and send it to a cloud provider.
What exactly are you looking for? Do you just want to filter out the human speech in the audio or do you actually want to know what the person has said?
Filtering the human speech is done by nearly every Smartphone by recording the background noice with a second microphone at the back of the device and subtract the two signals. But to be honest, I haven't seen any Android API were you can directly access the two signals.
If you want to do speech to text conversion, then have a look at Sphinx4 and Praat. Both do this job but again, I haven't seen an implementation for Android. Sphinx4 claims to be fully written in Java, so it should be possible to embed it in an Android App.
Voice detection is not that simple. There are several algorithms, some of them are published, for example GSM VAD. Several open source VAD libraries are available, some of them are discussed here