Sagemaker LDA topic model - how to access the params of the trained model? Also is there a simple way to capture coherence

白昼怎懂夜的黑 提交于 2019-12-08 03:14:59

问题


I'm new to Sagemaker and am running some tests to measure the performance of NTM and LDA on AWS compared with LDA mallet and native Gensim LDA model.

I'm wanting to inspect the trained models on Sagemaker and look at stuff like what words have the highest contribution for each topic. And also to get a measure of model coherence.

I have been able to successfully get what words have the highest contribution for each topic for NTM on Sagemaker by downloading the output file untarring it and unzipping to expose 3 files params, symbol.json and meta.json.

However, when I try to do the same process for LDA, the untarred output file cannot be unzipped.

Maybe I'm missing something or should do something different for LDA compared with NTM but I have not been able to find any documentation on this. Also, anyone found a simple way to calculate model coherence?

Any assistance would be greatly appreciated!


回答1:


This SageMaker notebook, which dives into the scientific details of LDA, also demonstrates how to inspect the model artifacts. Specifically, how to obtain the estimates for the Dirichlet prior alpha and the topic-word distribution matrix beta. You can find the instructions in the section titled "Inspecting the Trained Model". For convenience, I will reproduce the relevant code here:

import tarfile
import mxnet as mx

# extract the tarball
tarflie_fname = FILENAME_PREFIX + 'model.tar.gz' # wherever the tarball is located
with tarfile.open(tarfile_fname) as tar:
    tar.extractall()

# obtain the model file (should be the only file starting with "model_")
model_list = [
    fname
    for fname in os.listdir(FILENAME_PREFIX)
    if fname.startswith('model_')
]
model_fname = model_list[0]

# load the contents of the model file into MXNet arrays
alpha, beta = mx.ndarray.load(model_fname)

That should get you the model data. Note that the topics, which are stored as rows of beta, are not presented in any particular order.



来源:https://stackoverflow.com/questions/54924835/sagemaker-lda-topic-model-how-to-access-the-params-of-the-trained-model-also

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!