问题
It is possible to differentiate among speakers/users with the Watson-Unity-SDK, as it seems to be able to return an array that identifies which words were spoken by which speakers in a multi-person exchange, but I cannot figure out how to execute it, particularly in the case where I am sending different utterances (spoken by different people) to the Assistant service to get a response accordingly.
The code snippets for parsing Assistant's json
output/response as well as OnRecognize
and OnRecognizeSpeaker
and SpeechRecognitionResult
and SpeakerLabelsResult
are there, but how do I get Watson to return this from the server when an utterance is recognized and its intent is extracted?
Both OnRecognize
and OnRecognizeSpeaker
are used only once in the Active
property, so they are both called, but only OnRecognize
does the Speech-to-Text (transcription) and OnRecognizeSpeaker
is never fired...
public bool Active
{
get
{
return _service.IsListening;
}
set
{
if (value && !_service.IsListening)
{
_service.RecognizeModel = (string.IsNullOrEmpty(_recognizeModel) ? "en-US_BroadbandModel" : _recognizeModel);
_service.DetectSilence = true;
_service.EnableWordConfidence = true;
_service.EnableTimestamps = true;
_service.SilenceThreshold = 0.01f;
_service.MaxAlternatives = 0;
_service.EnableInterimResults = true;
_service.OnError = OnError;
_service.InactivityTimeout = -1;
_service.ProfanityFilter = false;
_service.SmartFormatting = true;
_service.SpeakerLabels = false;
_service.WordAlternativesThreshold = null;
_service.StartListening(OnRecognize, OnRecognizeSpeaker);
}
else if (!value && _service.IsListening)
{
_service.StopListening();
}
}
}
Typically, the output of Assistant (i.e. its result) is something like the following:
Response: {"intents":[{"intent":"General_Greetings","confidence":0.9962662220001222}],"entities":[],"input":{"text":"hello eva"},"output":{"generic":[{"response_type":"text","text":"Hey!"}],"text":["Hey!"],"nodes_visited":["node_1_1545671354384"],"log_messages":[]},"context":{"conversation_id":"f922f2f0-0c71-4188-9331-09975f82255a","system":{"initialized":true,"dialog_stack":[{"dialog_node":"root"}],"dialog_turn_counter":1,"dialog_request_counter":1,"_node_output_map":{"node_1_1545671354384":{"0":[0,0,1]}},"branch_exited":true,"branch_exited_reason":"completed"}}}
I have set up intents
and entities
, and this list is returned by the Assistant service, but I am not sure how to get it to also consider my entities or how to get it to respond accordingly when the STT recognizes different speakers.
I would appreciate some help, particularly how to do this via Unity scripting.
回答1:
I had the exact same question about dealing with the Assistant's messages, so I looked at the Assistant.OnMessage()
method that returns a string like “Response: {0}”, customData[“json”].ToString()
plus the JSON
output that will be something like this:
[Assistant.OnMessage()][DEBUG] Response: {“intents”:[{“intent”:”General_Greetings”,”confidence”:1}],”entities”:[],”input”:{“text”:”hello”},”output”:{“text”:[“good evening”],”nodes_visited”: etc...}
I personally parse the JSON
in order to extract the content from messageResponse.Entities
. In the above example, you can see that that the array is empty, but if you are populating it, then that’s where you need to extract the values from and then in your code you can do what you want.
Regarding the different speaker recognition, in the Active
property whose code you have included, the _service.StartListening(OnRecognize, OnRecognizeSpeaker)
line takes care of both, so perhaps put some Debug.Log
statements inside their code blocks to see if they are called or not.
回答2:
Please set SpeakerLabels
to True
_service.SpeakerLabels = true;
来源:https://stackoverflow.com/questions/54512540/assistant-entities-and-different-speakers