im trying to use Apache Spark for document classification.
For example i have two types of Class (C and J)
Train data is :
C, Chinese Beijing Chi
You can use mlib's naive bayes classifier for this. A sample example is given in the link. http://spark.apache.org/docs/latest/mllib-naive-bayes.html