text classification methods? SVM and decision tree

前端 未结 3 562
没有蜡笔的小新
没有蜡笔的小新 2021-02-05 16:27

i have a training set and i want to use a classification method for classifying other documents according to my training set.my document types are news and categories are sports

3条回答
  •  庸人自扰
    2021-02-05 16:57

    • Naive Bayes

    Though this is the simplest algorithm and everything is deemed independent, in real text classification case, this method work great. And I would try this algorithm first for sure.

    • KNN

    KNN is for clustering rather than classification. I think you misunderstand the conception of clustering and classification.

    • SVM

    SVM has SVC(classification) and SVR(Regression) algorithms to do class classification and prediction. It sometime works good, but from my experiences, it has bad performance in text classification, as it has high demands for good tokenizers (filters). But the dictionary of the dataset always has dirty tokens. The accuracy is really bad.

    • Random Forest (decision tree)

    I've never try this method for text classification. Because I think decision tree need several key nodes, while it's hard to find "several key tokens" for text classification, and random forest works bad for high sparse dimensions.

    FYI

    These are all from my experiences, but for your case, you have no better ways to decide which methods to use but to try every algorithm to fit your model.

    Apache's Mahout is a great tool for machine learning algorithms. It integrates three aspects' algorithms: recommendation, clustering, and classification. You could try this library. But you have to learn some basic knowledge about Hadoop.

    And for machine learning, weka is a software toolkit for experiences which integrates many algorithms.

提交回复
热议问题