Apache Spark Naive Bayes based Text Classification

后端 未结 4 1866
名媛妹妹
名媛妹妹 2021-02-03 15:47

im trying to use Apache Spark for document classification.

For example i have two types of Class (C and J)

Train data is :

C, Chinese Beijing Chi         


        
4条回答
  •  有刺的猬
    2021-02-03 16:37

    Yes, it doesn't look like there is any simple tool to do that in Spark yet. But you can do it manually by first creating a dictionary of terms. Then compute IDFs for each term and then convert each documents into vectors using the TF-IDF scores.

    There is a post on http://chimpler.wordpress.com/2014/06/11/classifiying-documents-using-naive-bayes-on-apache-spark-mllib/ that explains how to do it (with some code as well).

提交回复
热议问题