speech-recognition

Python SpeechRecognition word by word? continuous output?

ε祈祈猫儿з 提交于 2020-06-27 17:28:11
问题 I was wondering whether there is a way to output words as soon as possible. For example if I say "hello world" it should output: hello world Currently I'm using this code import speech_recognition as sr r = sr.Recognizer() with sr.Microphone() as source: while True: r.pause_threshold=0.1 ##i tried playing with these 3 but no luck r.phrase_threshold=0.5 r.non_speaking_duration=0.1 audio = r.listen(source) try: text = r.recognize_google(audio) print(text) except Exception as e: print("-") What

python webrtc voice activity detection is wrong

让人想犯罪 __ 提交于 2020-06-27 11:17:36
问题 I need to do voice activity detection as a step to classify audio files. Basically, I need to know with certainty if a given audio has spoken language. I am using py-webrtcvad, which I found in git-hub and is scarcely documented: https://github.com/wiseman/py-webrtcvad Thing is, when I try it on my own audio files, it works fine with the ones that have speech but keeps yielding false positives when I feed it with other types of audio (like music or bird sound), even if I set aggressiveness at

SFSpeechRecognizer on MacOS not available despite successful authorization

与世无争的帅哥 提交于 2020-06-17 01:48:52
问题 I am trying to get a clumsy Objective-C proof-of-concept example to run with SFSpeechRecognizer on Catalina transcribing a local audio file. After some googling I have managed to get the authorization to work by adding an Info.plist with NSSpeechRecognitionUsageDescription and I get the authorization dialog and the correct SFSpeechRecognizerAuthorizationStatus (SFSpeechRecognizerAuthorizationStatusAuthorized). However, my SFSpeechRecognizer instance still is unavailable. I suspect, I must be

Long audio speech recognition on Android

一笑奈何 提交于 2020-06-11 06:02:08
问题 I want to develop a module which will use a speech to text support in Android. I found out many documentation and demos related to RecognizerIntent and such others. But I found that all of such demos just fetch the voice till 10 secs or so. But I want my demo to run for more than 5-10 minutes. I don't have any issue if that is not running offline, as my app is always working online. I have also looked in to Pocketsphinx on Android, but that didn't worked out well. Also, that gave support just

How do i control when to stop the audio input?

淺唱寂寞╮ 提交于 2020-06-02 06:20:46
问题 I am using the SpeechRecognition Python package to get the audio from the user. import speech_recognition as sr # obtain audio from the microphone r = sr.Recognizer() with sr.Microphone() as source: print("Say something!") audio = r.listen(source) This piece of code when executed starts listening for the audio input from the user. If the user does not speak for a while it automatically stops. I want to know how can we get to know that it has stopped listening to audio? How can I manually

How do i control when to stop the audio input?

戏子无情 提交于 2020-06-02 06:20:12
问题 I am using the SpeechRecognition Python package to get the audio from the user. import speech_recognition as sr # obtain audio from the microphone r = sr.Recognizer() with sr.Microphone() as source: print("Say something!") audio = r.listen(source) This piece of code when executed starts listening for the audio input from the user. If the user does not speak for a while it automatically stops. I want to know how can we get to know that it has stopped listening to audio? How can I manually

'Audio data must be audio data' error with google speech recognition in python

大城市里の小女人 提交于 2020-05-29 10:14:02
问题 I am trying to load an audio file in python and process it with google speech recognition The problem is that unlike in C++, python doesn't show data types, classes, or give you access to memory to convert between one data type and another by creating a new object and repacking data I dont understand how it's possible to convert from one data type to another in python The code in question is below, import speech_recognition as spr import librosa audio, sr = librosa.load('sample_data/metal.mp3

'Audio data must be audio data' error with google speech recognition in python

混江龙づ霸主 提交于 2020-05-29 10:10:07
问题 I am trying to load an audio file in python and process it with google speech recognition The problem is that unlike in C++, python doesn't show data types, classes, or give you access to memory to convert between one data type and another by creating a new object and repacking data I dont understand how it's possible to convert from one data type to another in python The code in question is below, import speech_recognition as spr import librosa audio, sr = librosa.load('sample_data/metal.mp3

recording and speech recognition at the same time with SpeechRecognizer and MediaRecorder

会有一股神秘感。 提交于 2020-05-28 04:45:21
问题 I'm trying to record audio and do speech recognition at the same time. Each of them works separately, but together only the recording works. The code looks like that: private SpeechRecognizer sr; private MediaRecorder recorder; private void startRecording() throws IOException { recorder = new MediaRecorder(); recorder.setAudioSource(MediaRecorder.AudioSource.MIC); recorder.setOutputFormat(MediaRecorder.OutputFormat.THREE_GPP); recorder.setOutputFile("/dev/null"); recorder.setAudioEncoder

Speech recognition in real-time with punctuation

别说谁变了你拦得住时间么 提交于 2020-05-27 11:34:50
问题 What is the way to implement speech recognition (voice to text) with auto punctuation? I want to use it to turn a lecture (45 min talk) into text and if possible update the view dynamically. I tried SpeechRecognizer but it only gives me words without punctuation and stops listening after the first words. 回答1: You can use Punctuator, it assigns punctuation to ASR result with the help of a deep neural network. 来源: https://stackoverflow.com/questions/40961892/speech-recognition-in-real-time-with