I am trying to implement automatic voice recording functionality, similar to the Talking Tom app. I use the following code to read input from the audio recorder and analyse the
In the completely general case, this is an unsolved problem. In the practical sense...
First step is to get as noise-free a recording as possible. As others have noted, that starts with a directional microphone as focused on the sound you want to keep as possible.
Second step is filtering. As noted previously, the telephone company did a lot of work on which frequency ranges are actually needed by humans for speech comprehension. Filtering out frequencies outside that range will make the voice sound like... well, a telephone... but will get rid of more of the background noise.
If you want to go beyond that, things can get really complicated. There are some algorithms which, if you can show them a sample of what you consider noise on that particular recording, will analyse it and try to subtract it out without damaging the sound you want to keep too much. This is not simple programming; if I were you I'd seriously consider buying it from someone who has already gotten it right rather than trying to reinvent/reimplement it. I don't know whether any of them are available for Android or whether the typical Android box has enough computing power to execute them in anything like realtime. (I've used SoundSoap in the studio to remove A/C noise, and it works very well.)
In fact, my own inclincation would be to simplify the problem to a solved one: use the most directional and closest mike I could get, let Android do the recording... but then do the signal processing to clean it up later, using off-the-shelf tools. But I admit I'm biased because I have already invested in the latter.
Most of them have misunderstood the question and their replies solves problems different from yours.
You should parse the audio in your buffer searching for frequencies in the voice human range. As soon you detect them, will mean someone has started talking, and you can start recording (don't forget to include the buffer too as it contains the first part of the speech).
Search for routines that prints the list of frequencies in an audio raw stream
For voice detect, try ftt algorithm.
For noise, try speex library.
The way to process the input is to use a specialised library which removes noise.
For example, http://audacity.sourceforge.net, does noise removal.
So long as you have characterised the main types of noise, you should have only speech remaining.
It would be worthwhile collecting sampling data before the capture from the user, and after the user ended the capture, as this would provide at-the-time samples of noise in the environment. This is useful if each user faces unique background noise challenges.
Have you considered using Microsoft's speech Recognition API? You can use a voice key utterance to begin recording, like how they say "computer" before asking the computer something in Star Trek. Use ISpRecognizer::CreateRecoContext to load your recognition grammar and start recognition. Then implement a check with ISpPhrase to see if you should begin recording or not.
If you want to have a clean recording you can
2.After Filtration the noise would be decreased and you can use Voice recognition API's
API's
The more Filtering the better less noise More recognition, but be wary in filtering because it can also remove the Voice together with the noise.
Also read more about FFt
Fast Fourier Transform of Human Voice
Hope This Helps :)