Spark MLLib TFIDF implementation for LogisticRegression

柔情痞子 提交于 2019-11-28 07:02:51

IDFModel.transform() accepts a JavaRDD or RDD of Vector, as you see. It does not make sense to compute a model over a single Vector, so that's not what you're looking for right?

I assume you're working in Java, so you mean you want to apply this to a JavaRDD<LabeledPoint>. LabeledPoint contains a Vector and a label. IDF is not a classifier or regressor, so it needs no label. You can map a bunch of LabeledPoint to just extract their Vector.

But you already have a JavaRDD<Vector> above. TF-IDF is merely a way of mapping words to real-valued features based on word frequencies in the corpus. It also does not output a label. Maybe you mean you want to develop a classifier from TF-IDF-derived feature vectors, and some other labels you already have?

Maybe that clears things up but otherwise you'd have to greatly clarify what you are trying to achieve with TF-IDF.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!