speech-to-text

How can I improve Watson Speech to Text accuracy?

心已入冬 提交于 2019-12-01 00:36:30
I understand that Watson Speech To Text is somewhat calibrated for colloquial conversation and for 1 or 2 speakers. I also know that it can deal with FLAC better than WAV and OGG. I would like to know how could I improve the algorithm recognition, acoustically speaking. I mean, does increasing volume help? Maybe using some compression filter? Noise reduction? What kind of pre processing could help for this service? the best way to improve the accuracy of the base models (which are very accurate but also very general) is by using the Watson STT customization service: https://www.ibm.com/watson

speech to text api other language android

跟風遠走 提交于 2019-11-30 20:56:44
问题 I develop android application that can recognize speech in Mandarin, then yield text. But i can't find how to do that. Can someone give me example code of speech recognition in other language(mandarin, france, etc) ? public class MainActivity extends Activity { private TextView txtSpeechInput; private ImageButton btnSpeak; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); txtSpeechInput = (TextView)

Speech to text sdk freezes after video playback

蹲街弑〆低调 提交于 2019-11-30 20:37:54
问题 I'm using speech-to-text sdk provided by https://github.com/todoroo/iPhone-Speech-To-Text Recognizer works just fine until the moment I playback a video using MPMoviePlayerController. Here is the code i'm using to call recognizer: - (IBAction)actionBtRecognition:(id)sender { if(recognizer == nil){ recognizer = [[SpeechToTextModule alloc] init]; } [recognizer beginRecording]; } To playback movie I used this tutorial So, once I playback a movie and call recognizer, it's just freezes. When I

Large vocabulary speech recognition in iPhone without internet?

余生长醉 提交于 2019-11-30 16:20:15
I used Openears which needs dictionary. It is usefull when we mention the word in dictionary. I wanted to convert all words we speak. So I used Nuance’s speech to recognition dragaon SDK. But it communicates with webserver. I want to avoid server communication because of security concerns. Is it possible to convert speech to text for all words we speak as it is in windows mobile without communicating server only in offline mode? Speech recognition with unlimited vocabulary requires very big computational and memory resources (gigabytes of memory) and thus it's very hard to do that in iPhone on

Comparison of Speech Recognition use in Android: by Intent or on-thread?

风格不统一 提交于 2019-11-30 10:51:14
问题 Introduction Android provides two ways for me to use speech recognition. The first way is by an Intent , as in this question: Intent example. A new Activity is pushed onto the top of the stack which listens to the user, hears some speech, attempts to transcribes it (normally via the cloud) then returns the result to my app, via an onActivityResult call. The second is by getting a SpeechRecognizer , like the code here: SpeechRecognizer example. Here, it looks like the speech is recorded and

how to detect language spoken in google cloud platform machine learning speech api

一世执手 提交于 2019-11-30 08:32:21
问题 Is there an option to automatically detect the spoken language using Google Cloud Platform Machine Learning's Speech API? https://cloud.google.com/speech/docs/languages indicates the list of the languages supported and user needs to be manually set this parameter to perform speech-to-text. Thanks Mahesh 回答1: As of last month, Google added support for detection of spoken languages into its speech-to-text API. Google Cloud Speech v1p1beta1 It’s a bit limited though - you have to provide a list

Speech recognition response is poor in sphinx4

为君一笑 提交于 2019-11-30 07:43:55
Currently we are investigating into using sphinx4 for speech recognition. We are trying to achieve a good response for a dictation type application. The input is a wav file and we wish to transcribe it. I have looked into the LatticeDemo and Transcriber demo provided by Sphinx4. When i utilize the same configuration , the response is pretty poor. I have tried to tweak in the configuration files but it simply does not recognize the words. the transcriber demo provided is for digits, i have modified the config file to understand words. But i am not sure if i am missing something. I have attached

Voice/Speech to text [closed]

♀尐吖头ヾ 提交于 2019-11-30 00:17:27
I need an API or library (preferably free) that will convert voice/speech through a microphone, into text (string). Additionally, I will need an API or library that can do text-to-speech. I'd like to use C# and .NET, but other languages will suffice. Thanks. ShahidAzim You can use CMU Sphinx as it is pretty open and scalable solution and I think it can be used at both client and server side: http://cmusphinx.sourceforge.net/ If you are looking for a Microsoft desktop solution then you can use SAPI: http://msdn.microsoft.com/en-us/magazine/cc163663.aspx On server side, you can use Microsoft

Large vocabulary speech recognition in iPhone without internet?

守給你的承諾、 提交于 2019-11-29 23:13:10
问题 I used Openears which needs dictionary. It is usefull when we mention the word in dictionary. I wanted to convert all words we speak. So I used Nuance’s speech to recognition dragaon SDK. But it communicates with webserver. I want to avoid server communication because of security concerns. Is it possible to convert speech to text for all words we speak as it is in windows mobile without communicating server only in offline mode? 回答1: Speech recognition with unlimited vocabulary requires very

How do I convert speech to text?

寵の児 提交于 2019-11-29 18:59:58
How could I take MP3 and convert the speech to text? I've got some recorded notes from a conference and from meetings (there is a single voice on the recording, which is my voice). I thought it would be easier and intellectually interesting to convert to text using speech to text tools rather than simply transcribe by hand. I know there are technologies out there, especially for VoIP applications using Asterisk and Podcasts, but what are they and how can I use them? Open Source: CMU Sphinx Shareware: http://www.e-speaking.com/ (Windows) Commercial: Dragon NaturallySpeaking (Windows) .NET can