问题
I am trying to implement automatic voice recording functionality, similar to the Talking Tom app. I use the following code to read input from the audio recorder and analyse the buffer :
float totalAbsValue = 0.0f;
short sample = 0;
numberOfReadBytes = audioRecorder.read( audioBuffer, 0, bufferSizeInBytes);
// Analyze Sound.
for( int i=0; i<bufferSizeInBytes; i+=2 )
{
sample = (short)( (audioBuffer[i]) | audioBuffer[i + 1] << 8 );
totalAbsValue += Math.abs( sample ) / (numberOfReadBytes/2);
}
// Analyze temp buffer.
tempFloatBuffer[tempIndex%3] = totalAbsValue;
float temp = 0.0f;
for( int i=0; i<3; ++i )
temp += tempFloatBuffer[i];
Now I am able to detect voice input coming from the audio recorder and I can analyse the audio buffer.
The buffer is converted to an float value and if it increases by a certain amount, it is assumed that there is some sound in the background and recording is started. But the problem is that the app starts recording all background noise, including fan/AC duct sounds.
Can anyone help me with analysing the buffer to detect human voice only? Or are there any other alternative ways to detect human voice from the audio recorder input?
Thanks in advance,
回答1:
Voice detection is not that simple. There are several algorithms, some of them are published, for example GSM VAD. Several open source VAD libraries are available, some of them are discussed here
回答2:
If you want to have a clean recording you can
- Filter noise from the voice, you can use FFT for that and apply filters such as lowpass, highpass and bandpass filters Filtering using FFT and Filters
2.After Filtration the noise would be decreased and you can use Voice recognition API's
API's
The more Filtering the better less noise More recognition, but be wary in filtering because it can also remove the Voice together with the noise.
Also read more about FFt
Fast Fourier Transform of Human Voice
Hope This Helps :)
回答3:
For voice detect, try ftt algorithm.
For noise, try speex library.
回答4:
The way to process the input is to use a specialised library which removes noise.
For example, http://audacity.sourceforge.net, does noise removal.
So long as you have characterised the main types of noise, you should have only speech remaining.
It would be worthwhile collecting sampling data before the capture from the user, and after the user ended the capture, as this would provide at-the-time samples of noise in the environment. This is useful if each user faces unique background noise challenges.
回答5:
What exactly are you looking for? Do you just want to filter out the human speech in the audio or do you actually want to know what the person has said?
Filtering the human speech is done by nearly every Smartphone by recording the background noice with a second microphone at the back of the device and subtract the two signals. But to be honest, I haven't seen any Android API were you can directly access the two signals.
If you want to do speech to text conversion, then have a look at Sphinx4 and Praat. Both do this job but again, I haven't seen an implementation for Android. Sphinx4 claims to be fully written in Java, so it should be possible to embed it in an Android App.
回答6:
Have you considered using Microsoft's speech Recognition API? You can use a voice key utterance to begin recording, like how they say "computer" before asking the computer something in Star Trek. Use ISpRecognizer::CreateRecoContext to load your recognition grammar and start recognition. Then implement a check with ISpPhrase to see if you should begin recording or not.
回答7:
In the completely general case, this is an unsolved problem. In the practical sense...
First step is to get as noise-free a recording as possible. As others have noted, that starts with a directional microphone as focused on the sound you want to keep as possible.
Second step is filtering. As noted previously, the telephone company did a lot of work on which frequency ranges are actually needed by humans for speech comprehension. Filtering out frequencies outside that range will make the voice sound like... well, a telephone... but will get rid of more of the background noise.
If you want to go beyond that, things can get really complicated. There are some algorithms which, if you can show them a sample of what you consider noise on that particular recording, will analyse it and try to subtract it out without damaging the sound you want to keep too much. This is not simple programming; if I were you I'd seriously consider buying it from someone who has already gotten it right rather than trying to reinvent/reimplement it. I don't know whether any of them are available for Android or whether the typical Android box has enough computing power to execute them in anything like realtime. (I've used SoundSoap in the studio to remove A/C noise, and it works very well.)
In fact, my own inclincation would be to simplify the problem to a solved one: use the most directional and closest mike I could get, let Android do the recording... but then do the signal processing to clean it up later, using off-the-shelf tools. But I admit I'm biased because I have already invested in the latter.
回答8:
I tried to solve a similar problem on Windows. One thing I learned fast -- simple frequency analysis with a fast Fourier transform is not enough. Lots of noises hit human frequencies -- from simple taps on the microphone to clapping hands. Even some level of sophisticated filtering won't do it. I've found the easiest way is to take the noise to a cloud API and ask it to transcribe the speech. If the cloud API can transcribe to a reasonable length string, then I can continue recording -- else, stop recording. This does require that you sample some noise and send it to a cloud provider.
回答9:
Most of them have misunderstood the question and their replies solves problems different from yours.
You should parse the audio in your buffer searching for frequencies in the voice human range. As soon you detect them, will mean someone has started talking, and you can start recording (don't forget to include the buffer too as it contains the first part of the speech).
Search for routines that prints the list of frequencies in an audio raw stream
来源:https://stackoverflow.com/questions/18355448/detect-human-voice-from-audio-file-input