问题
I have an audio file in Wav format that I want to transcribe:
My code is:
import speech_recognition as sr
harvard = sr.AudioFile('speech_file.wav')
with harvard as source:
try:
audio = r.listen(source)
#print("Done")
except sr.UnknownValueError:
exec()
r.recognize_google(audio)
I do receive an output:
Out[20]: 'thank you for calling my name is Denise who I have a pleasure speaking with hi my name is Mary Jane. Good afternoon Mary Jane I do have your account open with your email'
However, there is a lot more that is spoken after this. I think it only captures this part of the speech because there is a short pause after the word "email" is said in the audio file. I tried to set the duration, but i received an error:
import speech_recognition as sr
harvard = sr.AudioFile('speech_file.wav')
with harvard as source:
try:
audio = r.listen(source,duration = 200)
#print("Done")
except sr.UnknownValueError:
exec()
r.recognize_google(audio)
Traceback (most recent call last):
File "<ipython-input-24-30fb65edc627>", line 5, in <module>
audio = r.listen(source,duration = 200)
TypeError: listen() got an unexpected keyword argument 'duration'
What do I do so that my code transcribes the entire audio file and does not stop printing the text if there are pauses?
回答1:
You can use timeout
instead of duration
like so:
audio = r.listen(source, timeout=2)
This means that the model will wait two seconds at most for a phrase to start before giving up and throwing anspeech_recognition.WaitTimeoutError
exception. If timeout=None
, there will be no wait which is your case.
EDIT
All the function recognize_google()
does is to call the google Speech API and get back the result. When I used the provided audio file, I got back the transcription of the first 30 seconds. That's due to the limitation of the free version of the Google speech API and has nothing to do with the code.
来源:https://stackoverflow.com/questions/59020670/speech-recognition-duration-setting-issue-in-python