问题
I have 6 second audio recording(ar-01.wav
) in wav
format. I want to transcribe the audio file to text using amazon services. For that purpose I created a bucket by name test-voip
and uploaded the audio file to bucket. When I try to convert the speech to text, a 6 second audio is taking 13.12 seconds. Here is my code snippet
session = boto3.Session(aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key)
transcribe = session.client('transcribe', region_name='us-east-1')
job_name = "audio_text_trail9"
job_uri = "https://test-voip.s3.amazonaws.com/ar-01.wav"
transcribe.start_transcription_job(
TranscriptionJobName=job_name,
Media={'MediaFileUri': job_uri},
MediaFormat='wav',
LanguageCode='en-US',
MediaSampleRateHertz=16000
)
while True:
status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
break
print("converted to text")
myurl = status['TranscriptionJob']['Transcript']['TranscriptFileUri']
print(myurl)
Text_Data = (requests.get(myurl).json())['results']['transcripts'][0]['transcript']
print(Text_data)
Here my code is working fine and the accuracy is awesome even on a noisy audio, but the time consumption is too high. Where did I do the mistake and what is dragging that much huge time to transcribe? Once I get the transcribed json
, time for extracting the information required is negligible. How to speed up the process for transcribe or is there any other better way to do it?
回答1:
For me, AWS Transcribe took 20 minutes to transcribe a 17 minute file. One possible idea is to split the audio file in chunks and then use multiprocessing with 16 cores at EC2, like a g3.4xlarge instance.
Split the audio file in 16 parts with a silence threshold of -20, then convert to .wav:
$ sudo apt-get install mp3splt
$ sudo apt-get install ffmpeg
$ mp3splt -s -p th=-20,nt=16 splitted.mp3
$ ffmpeg -i splitted.mp3 splitted.wav
Then, use the multiprocessing with 16 cores transcribing simultaneously, mapping your transcribe function (transcribe.start_transcription_job) for each one of the TranscriptionJobName and job_uri's:
import multiprocessing
output=[]
data = range(0,16)
def f(x):
job_name = "Name"+str(x)
job_uri = "https://s3.amazonaws.com/bucket/splitted"+str(x)+".wav"
transcribe.start_transcription_job(
TranscriptionJobName=job_name,
Media={'MediaFileUri': job_uri},
MediaFormat='wav',
LanguageCode='pt-BR',
OutputBucketName= "bucket",
MediaSampleRateHertz=8000,
Settings={"MaxSpeakerLabels": 2,
"ShowSpeakerLabels": True})
while True:
status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED','FAILED']:
break
def mp_handler():
p = multiprocessing.Pool(16)
r=p.map(f, data)
return r
if __name__ == '__main__':
output.append(mp_handler())
回答2:
I have researched for a trascription speed guarantee with no luck
In this forum post (requires an aws account) a poster makes a benchmark with the following results:
- A 10 minute clip took about 5 minutes
- 40 minute clips take around 17 minutes
- a 2 hour file took 36 minutes
What seems to be an official Amazon source states that "At this time, transcription speeds are better optimized for audio longer than 30 seconds. You'll start to see a better processing time to audio duration time ratio when the audio file length is about 2 minutes or longer. Having said, this we are working hard to enhance transcription speeds overall"
I hope it helps researchers
来源:https://stackoverflow.com/questions/51929131/how-to-speed-up-processing-time-of-aws-transcribe