Disable token breaks on punctuation LUIS.ai

爱⌒轻易说出口 提交于 2020-01-14 14:03:29

问题


I am working with Microsoft Cognitive Service's Language Understanding Service API, LUIS.ai.

Whenever text is parsed by LUIS, whitespace tokens are always inserted around punctuation.

This behavior is intentional, according to the documentation.

"English, French, Italian, Spanish: token breaks are inserted at any whitespace, and around any punctuation."

For my project, I need to preserve the original query string, without these tokens, as some entities trained for my model will include punctuation, and it's annoying and a bit hacky to strip the extra whitespace from the parsed entities.

Example of this behavior:

Is there a way to disable this? It would save quite a bit of effort.

Thanks!!


回答1:


Unfortunately there's no way to disable that for now, but the good news is that the predictions returned will deal with the original string, not the tokenized one you see in the example labeling process.

Here in the documentation of how to understand the JSON response you can see the example output preservers the original "query" string, and the extracted entities have the zero based character indices ("startIndex", "endIndex") in the original string; this will allow you to deal with the indices instead of parsed entity phrases.

{
"query": "Book me a flight to Boston on May 4",
"intents": [
  {
    "intent": "BookFlight",
    "score": 0.919818342
  },
  {
    "intent": "None",
    "score": 0.136909246
  },
  {
    "intent": "GetWeather",
    "score": 0.007304534
  }
],
"entities": [
  {
    "entity": "boston",
    "type": "Location::ToLocation",
    "startIndex": 20,
    "endIndex": 25,
    "score": 0.621795356
  },
  {
    "entity": "may 4",
    "type": "builtin.datetime.date",
    "startIndex": 30,
    "endIndex": 34,
    "resolution": {
      "date": "XXXX-05-04"
    }
  }
]

}



来源:https://stackoverflow.com/questions/38749246/disable-token-breaks-on-punctuation-luis-ai

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!