How do I get perplexity and log likelihood in Spark LDA? [closed]

試著忘記壹切 提交于 2019-12-10 12:01:56

问题


I'm trying to get perplexity and log likelihood of a Spark LDA model (with Spark 2.1). The code below does not work (methods logLikelihood and logPerplexity not found) although I can save the model.

from pyspark.mllib.clustering import LDA
from pyspark.mllib.linalg import Vectors

# construct corpus
# run LDA
ldaModel = LDA.train(corpus, k=10, maxIterations=10)
logll = ldaModel.logLikelihood(corpus)
perplexity = ldaModel.logPerplexity(corpus)

Notice that such methods do not come up with dir(LDA).

What would be a working example?


回答1:


I can do train but not fit. 'LDA' object has no attribute 'fit'

That's because you are working with the old, RDD-based API (MLlib), i.e.

from pyspark.mllib.clustering import LDA # WRONG import

whose LDA class indeed does not include fit, logLikelihood, or logPerplexity methods.

In order to work with these methods, you should switch to the new, dataframe-based API (ML):

from pyspark.ml.clustering import LDA  # NOTE: different import

# Loads data.
dataset = (spark.read.format("libsvm")
    .load("data/mllib/sample_lda_libsvm_data.txt"))

# Trains a LDA model.
lda = LDA(k=10, maxIter=10)
model = lda.fit(dataset)

ll = model.logLikelihood(dataset)
lp = model.logPerplexity(dataset)


来源:https://stackoverflow.com/questions/48383466/how-do-i-get-perplexity-and-log-likelihood-in-spark-lda

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!