Text Classification into Categories

后端未结

关注

 3  1290

I am working on a text classification problem, I am trying to classify a collection of words into category, yes there are plenty of libraries available for classification, s

相关标签:

3条回答

旧时难觅i

2020-12-31 23:07

If possible then read the section called "A Naive Classifier" in chapter "Document Filtering" in book called "Programming Collective Intelligence". Although the examples are in Python, I hope that will not be of much trouble to you.

0 讨论(0)
发布评论:

提交评论
- 加载中...
小鲜肉

2020-12-31 23:08

Of course this can be implemented. If you train a Naive Bayes classifier or linear SVM on the right dataset (titles of Java and C# programming books, I guess), it should learn to associate the term "Java" with Java, "C#" and ".NET" with C#, and "programming" with both. I.e., a Naive Bayes classifier would likely learn a roughly even probability of Java or C# for common terms like "programming" if the dataset is divided evenly.

0 讨论(0)
发布评论:

提交评论
- 加载中...
挽巷

2020-12-31 23:17

A dirt simple way of implementing this is using straight-up Lucene (or any text-indexing engine). Create a single Lucene document with all of the "java" examples, and another document with the "c#" examples, and add both to the index. To classify a new document, OR all the terms in the document and execute a query against the index, and grab the category with the highest score.

0 讨论(0)
发布评论:

提交评论
- 加载中...