Apache Spark Naive Bayes based Text Classification

后端 未结 4 1872
名媛妹妹
名媛妹妹 2021-02-03 15:47

im trying to use Apache Spark for document classification.

For example i have two types of Class (C and J)

Train data is :

C, Chinese Beijing Chi         


        
4条回答
  •  礼貌的吻别
    2021-02-03 16:22

    There any many classification methods (logistic regression, SVMs, neural networks,LDA, QDA...), you can either implement yours or use MLlib classification methods (actually, there are logistic regression and SVM implemented in MLlib)

    What you need to do is transform your features to a vector, and labels to doubles.

    For examples, your dataset will look like:

    1, (2,1,0,0,0,0)
    1, (2,0,1,0,0,0)
    0, (1,0,0,1,0,0)
    0, (1,0,0,0,1,1)
    

    And tour test vector:

    (3,0,0,0,1,1)
    

    Hope this helps

提交回复
热议问题