问题
I was wondering whether there is a way to output words as soon as possible. For example if I say "hello world" it should output:
hello
world
Currently I'm using this code
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
while True:
r.pause_threshold=0.1 ##i tried playing with these 3 but no luck
r.phrase_threshold=0.5
r.non_speaking_duration=0.1
audio = r.listen(source)
try:
text = r.recognize_google(audio)
print(text)
except Exception as e:
print("-")
What this does is that it records until the mic doesn't hear anything and then outputs everything that it heard in one line, I want to see what has been said as quickly as possible.
回答1:
There are streaming libraries that do that. One is Google's speech API python client. Another is https://github.com/alphacep/vosk-api. The Python code should look like this, it returns immediately as you speak.
from vosk import Model, KaldiRecognizer
import os
if not os.path.exists("model-en"):
print ("Please download the model from https://github.com/alphacep/vosk-android-demo/releases and unpack as 'model-en' in the current folder.")
exit (1)
import pyaudio
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8000)
stream.start_stream()
model = Model("model-en")
rec = KaldiRecognizer(model, 16000)
while True:
data = stream.read(2000)
if len(data) == 0:
break
if rec.AcceptWaveform(data):
print(rec.Result())
else:
print(rec.PartialResult())
print(rec.FinalResult())
来源:https://stackoverflow.com/questions/60684279/python-speechrecognition-word-by-word-continuous-output