speech-to-text | 易学教程

How to disable sentence-level auto correction in Google Cloud Speech-to-Text API

阅读更多关于 How to disable sentence-level auto correction in Google Cloud Speech-to-Text API

问题 I am working on a speech recognition task, which involves the detection of children's speaking capability, improvement over time ... I'd like to use the Google Cloud Speech to Text API for the ASR part of the detection. Then I would use the transcripts of different measurements to estimate the advancement. But! The sentence level autocorrect of Google Speech API consistently rewrites the previous limb of the spoken sentence... Is there a way to disable the autocorrect of this ASR? I can't

Windows Universal App Continuous Dictation Without Network

阅读更多关于 Windows Universal App Continuous Dictation Without Network

问题 Following the samples provided here: https://github.com/Microsoft/Windows-universal-samples provides a great overview of some of the capabilities for UWP apps. But, the speech example seems to require an active connection to the internet. Does anyone know if this capability is possible on a disconnected corporate network? Thanks, JRF 回答1: Yes, Examples 1, 2, 5, 6 and 9 work Offline since they don't use the predefined SRGS scenarios. Look in the Folder: SRGS to see the SRGS. You will have to

Configuring the length of utterance and pauses in Android's speech recognizer

阅读更多关于 Configuring the length of utterance and pauses in Android's speech recognizer

问题 I have android's Speech To Text API to speak something to the phone and convert it into text. By default, if one stops speaking to the microphone, the API assumes that the user is done talking and returns the text from the input speech. For my application, the user might have long pauses between her consecutive sentences. How can I configure Android's speech to text API to consider the end of the speech only when I ask it to and not do that as soon as the speaker takes a small pause between

Getting WAV file transcription to work with Sphinx4

阅读更多关于 Getting WAV file transcription to work with Sphinx4

问题 I've got Sphinx-4 installed on my windows XP system and JSAPI set up. I'd like to transcribe an English spoken WAV (or MP3) file to text. When I run the "WavFile" demo - it runs successfully. java -jar WavFile.jar But, when I pass my own wav file like this: java -jar WavFile.jar c:\test.wav I get: Loading Recognizer as defined in 'jar:file:/C:/sphinx4-1.0beta3-bin/sphinx4-1.0beta3/bin/WavFile.jar!/edu/cmu/sphinx/demo/wavfile/config.xml'... Decoding jar:file:/C:/sphinx4-1.0beta3-bin/sphinx4-1

How can I access IBM speech-to-text api with curl?

阅读更多关于 How can I access IBM speech-to-text api with curl?

问题 I cannot access the speech-to-text API on IBM Bluemix with curl! I tried the example from the documentation for a sessionless request with curl and it didn't work; I got an invalid userID/password message. Here is the error I got: "{ "code" : 401 , "error" : "Not Authorized" , "description" : "2016-10-08T15:22:37-04:00, Error ERCDPLTFRM-DNLKUPERR occurred when accessing https://158.85.132.94:443/speech-to-text/api/v1/recognize?timestamps=true&word_alternatives_threshold=0.9&continuous=true,

Small-size speech recognition on Android to look for keywords

阅读更多关于 Small-size speech recognition on Android to look for keywords

问题 I'm developing a voice command app and need to use speech to text in Android. I want my app to work offline. Its yet possible only in jellybean version and it requires huge sized database to download and keep in the device. But i don't require whole database, i just want few keywords for the conversions. Is it possible to record a .wav files on our own and set its reference to a particular word and when a voice input is given we could match the two voice tracks and recognize the corresponding

How can I transcribe a speech file with the Bing Speech API in Python?

阅读更多关于 How can I transcribe a speech file with the Bing Speech API in Python?

问题 How can I transcribe a speech file with the Bing Speech API in Python? My speech file is longer than 15 seconds. I'm aware that one may use the Bing Speech REST API in Python. https://gist.github.com/jellis505/973ea6de12508c7c720da4a074e7d065 gives an example in Python 2: #!/usr/bin/env python # -*- coding: utf-8 -*- import requests import httplib import uuid import json class Microsoft_ASR(): def __init__(self): self.sub_key = 'YourKeyHere' self.token = None pass def get_speech_token(self):

Recorded sound file (ala google now, google keep) - RecognizerIntent/Listener

阅读更多关于 Recorded sound file (ala google now, google keep) - RecognizerIntent/Listener

问题 I have been developing an application that uses the recognizerIntent to get voice input. However, since jelly bean was launched, I have not been able to get the actual sound file from my voice input. In the recognitionListener (http://developer.android.com/reference/android/speech/RecognitionListener.html) there is a method called onBufferReceived. However, there are no promises that this method will be called, and when I implemented it, it never got called. Is there any way to force this

Speech Recognition Service in Android

阅读更多关于 Speech Recognition Service in Android

问题 I have an Android application that uses speech recognition in an Activity. The GUI doesn't do anything except for contain the speech recognition objects. I would like to port this over to a service so I can talk to the application while it's running in the background. However, as far as I know, the speech recognition service has to use onActivityResult, which is unavailable for Services. Is there a way to either contain an Activity in a Service such that its GUI is not displayed, or perform

How to convert human voice into digital format?

阅读更多关于 How to convert human voice into digital format?

问题 I am working on a project where biometric system is used to secure the system. We are planning to use human voice to secure the system. Idea is to allow the person to say some words or sentences and system will store that voice in digital format. Next time person wants to enter the system, he/she has to speak some words which may or may not be different from the words used earlier. We don't want to match words but want to match voice frequency. I have read some research papers regarding this