speech-recognition

Speaker Diarization from Audio file Android

此生再无相见时 提交于 2020-01-06 05:25:16
问题 How to separating different speakers from the audio file in android? Google Cloud Speech API? (https://cloud.google.com/speech-to-text/docs/multiple-voices#speech-diarization-java) Possible dublicates of Speaker Diarization support in Google Speech API I have tried the demo of google cloud speech to text api but unable to get success for same, Please check below error log from logcat. Code: val content = latestAudioFile?.readBytes() val inputStream = this.getAssets().open("XXXXX-6e000f81XXXX

Click button on website using java-script and speech recognition

陌路散爱 提交于 2020-01-06 04:53:05
问题 Newbie here. I work on a website where I have to click buttons. I need to be able to click on buttons based on their DIV ID using speech recognition. Lets say a clickable button div has an ID of one, I want to say ONE verbally and have the button clicked. I am guessing I need Javascript click function combined with a speech recognition API. I can handle using Javascript to manipulate HTML DOM, but how do I interface with an offline speech recognition API. Which one should I use and how do I

MS SAPI SpeechRecognitionEngine in C# completely wrong transcription

廉价感情. 提交于 2020-01-06 03:19:30
问题 I'm new to MS SAPI and I'm trying to write a WAV to TXT conversion utility in C#/Windows Forms using SpeechRecognitionEngine class. I've noticed the speech is completely incorrect. The words don't even sound similar. I'm guessing this could be influenced by a long list of factors, such as sound quality of the input WAV file and the grammar loaded into the recognition engine. I am using the DictationGrammar class. I'd appreciate any leads from seasoned speech recognition/digital signal

Convert Microsoft Project Oxford Speech Recognition from Objective-C to SWIFT

橙三吉。 提交于 2020-01-06 02:53:09
问题 Microsoft Project Oxford has a nice Speech Recognition API and instructions for Objective-C on IOS. I build it easily following the getting started instructions. However, I am having hard time to convert it to Swift language. I created a swift project first. I created the bridge header file (ProjectName-Bridging-Header.h) and inserted following code to this file: #import "SpeechRecognitionService.h" I want to convert Objective-C both header and implementation files into ViewController.swift.

How do speech recognition algorithms recognize homophones?

大兔子大兔子 提交于 2020-01-05 15:20:38
问题 I was pondering this question earlier. What clues do modern algorithms (specifically those that convert voice to text) use to determine which homophone was said (E.g. to, too, or two?) Do they use contextual clues? Sentence structure? Perhaps there are slight differences in the way each word is usually pronounced (for example, I usually hold the o sound longer in two than in to ). A combination of the first two seems most plausible. 回答1: Do they use contextual clues? Yes, ASR systems use

How do speech recognition algorithms recognize homophones?

青春壹個敷衍的年華 提交于 2020-01-05 15:19:14
问题 I was pondering this question earlier. What clues do modern algorithms (specifically those that convert voice to text) use to determine which homophone was said (E.g. to, too, or two?) Do they use contextual clues? Sentence structure? Perhaps there are slight differences in the way each word is usually pronounced (for example, I usually hold the o sound longer in two than in to ). A combination of the first two seems most plausible. 回答1: Do they use contextual clues? Yes, ASR systems use

How do speech recognition algorithms recognize homophones?

十年热恋 提交于 2020-01-05 15:19:08
问题 I was pondering this question earlier. What clues do modern algorithms (specifically those that convert voice to text) use to determine which homophone was said (E.g. to, too, or two?) Do they use contextual clues? Sentence structure? Perhaps there are slight differences in the way each word is usually pronounced (for example, I usually hold the o sound longer in two than in to ). A combination of the first two seems most plausible. 回答1: Do they use contextual clues? Yes, ASR systems use

webkitspeechrecognition no longer prompts for permission

我只是一个虾纸丫 提交于 2020-01-05 04:57:25
问题 I've been prototyping a few pages that use webkitspeechrecognition. I learned quickly that you cannot load these from a file, you have to serve them from a webserver. I'm using osx so I just moved my files to the local apache that was already running and enabled. This worked fine for quite a while. For some reason, none of my pages that were working fine will prompt me to deny/allow the microphone usage. I even copied an existing working page from another webserver and if I load it from http:

Sphinx 4 Failed to align audio to trancript

吃可爱长大的小学妹 提交于 2020-01-05 04:18:07
问题 I am following Acoustic Model Adaption using Sphinx 4 with the following wav files. Here is the result I get when using bw -hmmdir wsj -moddeffn wsj/mdef -ts2cbfn .cont. -feat 1s_c_d_dd -cmn current -agc none -dictfn vn.dic -ctlfn lisp.fileids -lsnfn lisp.transcription -accumdir . utt> 0 lisp_0001 53INFO: cmn.c(175): CMN: 73.43 2.89 -0.3 4 -1.85 -0.98 -0.52 0.33 0.67 -0.77 -0.56 0.18 -0.50 -0.30 0 28 1 ERROR: "backward.c", line 421: Failed to align audio to trancript: final state of the

Creating custom voice commands (GNU/Linux)

非 Y 不嫁゛ 提交于 2020-01-04 13:10:28
问题 I'm looking for advices, for a personal project. I'm attempting to create a software for creating customized voice commands. The goal is to allow user/me to record some audio data (2/3 secs) for defining commands/macros. Then, when the user will speak (record the same audio data), the command/macro will be executed. The software must be able to detect a command in less than 1 second of processing time in a low-cost computer (RaspberryPi, for example). I already searched in two ways : - Speech