Intent classification with large number of intent classes

眉间皱痕 提交于 2021-01-29 08:21:46

问题


I am working on a data set of approximately 3000 questions and I want to perform intent classification. The data set is not labelled yet, but from the business perspective, there's a requirement of identifying approximately 80 various intent classes. Let's assume my training data has approximately equal number of each classes and is not majorly skewed towards some of the classes. I am intending to convert the text to word2vec or Glove and then feed into my classifier.

I am familiar with cases in which I have a smaller number of intent classes, such as 8 or 10 and the choice of machine learning classifiers such as SVM, naive bais or deeplearning (CNN or LSTM).

My question is that if you have had experience with such large number of intent classes before, and which of machine learning algorithm do you think will perform reasonably? do you think if i use deep learning frameworks, still large number of labels will cause poor performance given the above training data?

We need to start labelling the data and it is rather laborious to come up with 80 classes of labels and then realise that it is not performing well, so I want to ensure that I am making the right decision on how many classes of intent maximum I should consider and what machine learning algorithm do you suggest?

Thanks in advance...


回答1:


First, word2vec and GloVe are, almost, dead. You should probably consider using more recent embeddings like BERT or ELMo (both of which are sensitive to the context; in other words, you get different embeddings for the same word in a different context). Currently, BERT is my own preference since it's completely open-source and available (gpt-2 was released a couple of days ago which is apparently a little bit better. But, it's not completely available to the public).

Second, when you use BERT's pre-trained embeddings, your model has the advantage of seeing a massive amount of text (Google massive) and thus can be trained on small amounts of data which will increase it's performance drastically.

Finally, if you could classify your intents into some coarse-grained classes, you could train a classifier to specify which of these coarse-grained classes your instance belongs to. Then, for each coarse-grained class train another classifier to specify the fine-grained one. This hierarchical structure will probably improve the results. Also for the type of classifier, I believe a simple fully connected layer on top of BERT would suffice.



来源:https://stackoverflow.com/questions/54850657/intent-classification-with-large-number-of-intent-classes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!