speech-to-text | 易学教程

Watson Speech-to-Text register_callback returns only 400s

阅读更多关于 Watson Speech-to-Text register_callback returns only 400s

问题 The Watson Speech-to-Text asynchronous HTTP interface allows one to register a callback url through a call to register_callback . This call is clearly not working; for illustration, please see these six lines of code. # Illustration of how I can't get the Watson Speech-to-Text # register_callback call to work. r = requests.post( "https://stream.watsonplatform.net/speech-to-text/api/v1/register_callback?{0}".format( urllib.urlencode({ "callback_url": callback_url })), auth=(watson_username,

Java speech API null response

阅读更多关于 Java speech API null response

问题 I am using the java speech recognition API - Jarvis located at https://github.com/lkuza2/java-speech-api However when I run my application, I get an error : Server returned HTTP response code: 400 for URL: https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US&maxresults=1 (This is the URL that this api uses to get response from Google) I also created a API key as mentioned in the earlier posts and tried to use the url (this is version 2 API): www.google.com/speech

Difference in word confidence in IBM Watson Speech to text

阅读更多关于 Difference in word confidence in IBM Watson Speech to text

问题 I am using the node sdk to use the IBM watson speech-to-text module. After sending the audio sample and receiving a response, the confidence factor looks weird. { "results": [ { "word_alternatives": [ { "start_time": 3.31, "alternatives": [ { "confidence": 0.7563, "word": "you" }, { "confidence": 0.0254, "word": "look" }, { "confidence": 0.0142, "word": "Lou" }, { "confidence": 0.0118, "word": "we" } ], "end_time": 3.43 }, ... and ... ], "alternatives": [ { "word_confidence": [ [ "you", 0

Swift 3.0 Speech to Text: Changing Color of Words

阅读更多关于 Swift 3.0 Speech to Text: Changing Color of Words

问题 I'm trying to change the color of words in textfield set by spoken words (ex: happy, sad, angry, etc). It doesn't work if the word is spoken more than once. For example, if I say, "I'm feeling happy because my cat is being nice to me. My brother is making me sad. I'm happy again." it will only change the color of the first 'happy' and I'm not exactly sure why. func setTextColor(text: String) -> NSMutableAttributedString { let string:NSMutableAttributedString = NSMutableAttributedString(string

What audio formats are supported by Azure Cognitive Services' Speech Service (SST)?

阅读更多关于 What audio formats are supported by Azure Cognitive Services' Speech Service (SST)?

问题 Bearing in mind that the Microsoft/Azure Cognitive Services' "Speech Service" is currently going through a rationalisation exercise, as far as I can tell from looking at https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/rest-apis#speech-to-text https://docs.microsoft.com/en-us/azure/cognitive-services/speech/home only .wav binaries are acceptable, with anything else giving the response: {"Message":"Unsupported audio format"} Is there any other way to discover the

Azure Speech To Text: Conversation Transcribing userid always return $ref$

阅读更多关于 Azure Speech To Text: Conversation Transcribing userid always return $ref$

问题 Using sample code to transcribe conversation, but on recognized event i always get $ref$ when calling e.Result.UserId . I use 16-bit samples, 16 kHz sample rate, and a single channel (Mono) format for voice signatures. And 32-bit samples, 32 kHz sample rate, and a single channel (Mono) format for Transcribing conversations. All code from: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-use-conversation-transcription-service Is there any ideas? or .wav sample

How to extract the values that return from the createRecognizeStream() method?

阅读更多关于 How to extract the values that return from the createRecognizeStream() method?

问题 Using Watson Speech to Text Services How to extract the values that return from the createRecognizeStream() method? Here is a chunk of the sample code. I am trying to see in the terminal the interim results but all i get is this. How do I set the options for the results to appear? { results: [ { alternatives: [Object], final: false } ], result_index: 0 } { results: [ { alternatives: [Object], final: false } ], result_index: 0 } { results: [ { alternatives: [Object], final: false } ]... they

AndroidPocketSphinx: How does the system know which recognizer is invoked?

阅读更多关于 AndroidPocketSphinx: How does the system know which recognizer is invoked?

问题 I am studying the source code of TestPocketSphinxAndAndroidASR.java and the first thing that is not so clear to me is how the system knows which recognizer (i.e. Google or CMUSphinx) to invoke. I can see that the recognition activity is started by: Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH); intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM); startActivityForResult(intent, VOICE_RECOGNITION_REQUEST_CODE); but as far as I

Google speech API - php does not return anything

阅读更多关于 Google speech API - php does not return anything

问题 My code is inspired by this php version of full duplex google speech API for speech-to-text : http://mikepultz.com/2013/07/google-speech-api-full-duplex-php-version/ I have few flac files that do work and give the array output as explained on Mike's post. But for few flac files it just doesn't return anything as output. For example : http://gavyadhar.com/video/upload/Pantry_Survey.flac, no output is returned. But the same code works for this flac file: http://gavyadhar.com/video/upload/pantry

Improve Android speech recognition with additional context

阅读更多关于 Improve Android speech recognition with additional context

问题 As I understand Android API uses google speech recognition service for speech to text. I've learned API and I see it's pretty simple and just converts voice to words array. Is any way to improve the recognition, I mean, if I know the context can I send some parameters to the service in order to improve the recognition? Or alternatively is any other speech recognition service which can be used for this purpose? Thanks in advance. 回答1: Is any way to improve the recognition, I mean, if I know