Google Cloud Speech API word Hints

后端 未结 1 783
孤独总比滥情好
孤独总比滥情好 2020-12-19 21:23

Can you give and example of using word hints in Google cloud speech API. I try to use Rest API executor for brook.flac. I input phrase Brooklin (instead of Brooklyn) but the

相关标签:
1条回答
  • 2020-12-19 21:53

    From https://cloud.google.com/speech-to-text/docs/speech-adaptation

    For any given recognition task, you may also pass a speechContext (of type SpeechContext) that provides information to aid in processing the given audio. Currently, a context can hold a list of phrases to act as "hints" to the recognizer; these phrases can boost the probability that such words or phrases will be recognized.

    You may use these phrase hints in a few ways:

    Improve the accuracy for specific words and phrases that may tend to be overrepresented in your audio data. For example, if specific commands are typically spoken by the user, you can provide these as phrase hints. Such additional phrases may be particularly useful if the supplied audio contains noise or the contained speech is not very clear. Add additional words to the vocabulary of the recognition task. The Cloud Speech API includes a very large vocabulary. However, if proper names or domain-specific words are out-of-vocabulary, you can add them to the phrases provided to your requests's speechContext. Phrases may be provided both as small groups of words or as single words. (See Content Limits for limits on the number and size of these phrases.) When provided as multi-word phrases, hints boost the probability of recognizing those words in sequence but also, to a lesser extent, boost the probability of recognizing portions of the phrase, including individual words.

    For example, this shwazil_hoful.flac file contains some made-up words. If recognition is performed without supplying these out-of-vocabulary words, the recognizer will not return the desired transcript, but instead return words that are in vocabulary, such as: "it's a swallow whole day".

    {
      "config": {
        "encoding":"FLAC",
        "sampleRateHertz": 16000,
        "languageCode":"en-US"
      },
      "audio":{
        "uri":"gs://speech-demo/shwazil_hoful.flac"
      }
    }
    

    However, when these out-of-vocabulary words are supplied with the recognition request, the recognizer will return the desired transcript: "it's a shwazil hoful day".

    {
      "config": {
        "encoding":"FLAC",
        "sampleRateHertz": 16000,
        "languageCode":"en-US",
        "speechContexts": {
          "phrases":["hoful","shwazil"]
         }
      },
      "audio":{
        "uri":"gs://speech-demo/shwazil_hoful.flac"
      }
    }
    

    Alternatively, if certain words are typically said together in a phrase, they can be grouped together, which may further increase the confidence that they will be recognized.

    {
      "config": {
        "encoding":"FLAC",
        "sampleRateHertz": 16000,
        "languageCode":"en-US",
        "speechContexts": {
          "phrases":["shwazil hoful day"]
         }
      },
      "audio":{
        "uri":"gs://speech-demo/shwazil_hoful.flac"
      }
    }
    

    In general, be sparing when providing speech context hints. Better recognition accuracy can be achieved by limiting phrases to only those expected to be spoken. For example, if there are multiple dialog states or device operating modes, provide only the hints that correspond to the current state, rather than always supplying hints for all possible states.

    0 讨论(0)
提交回复
热议问题