Text Classification into Categories

后端 未结 3 1290
野的像风
野的像风 2020-12-31 22:31

I am working on a text classification problem, I am trying to classify a collection of words into category, yes there are plenty of libraries available for classification, s

相关标签:
3条回答
  • 2020-12-31 23:07

    If possible then read the section called "A Naive Classifier" in chapter "Document Filtering" in book called "Programming Collective Intelligence". Although the examples are in Python, I hope that will not be of much trouble to you.

    0 讨论(0)
  • 2020-12-31 23:08

    Of course this can be implemented. If you train a Naive Bayes classifier or linear SVM on the right dataset (titles of Java and C# programming books, I guess), it should learn to associate the term "Java" with Java, "C#" and ".NET" with C#, and "programming" with both. I.e., a Naive Bayes classifier would likely learn a roughly even probability of Java or C# for common terms like "programming" if the dataset is divided evenly.

    0 讨论(0)
  • 2020-12-31 23:17

    A dirt simple way of implementing this is using straight-up Lucene (or any text-indexing engine). Create a single Lucene document with all of the "java" examples, and another document with the "c#" examples, and add both to the index. To classify a new document, OR all the terms in the document and execute a query against the index, and grab the category with the highest score.

    0 讨论(0)
提交回复
热议问题