Concept Behind The Transformed Data Of LDA Model

◇◆丶佛笑我妖孽 提交于 2020-01-05 03:36:18

问题


My question is related to Latent Dirichlet Allocation. Suppose we apply LDA on our dataset, then apply fit transform on that.

the output is a matrix that is a collection of five documents. Each document consists of three topics. othe output is below:

[[ 0.0922935   0.09218227  0.81552423]
 [ 0.81396651  0.09409428  0.09193921]
 [ 0.05265482  0.05240119  0.89494398]
 [ 0.05278187  0.89455775  0.05266038]
 [ 0.85209554  0.07338382  0.07452064]]

So, this is the matrix that will be sent to a classification method for an evaluation purpose.

For the classification part, we need the labels for each row. But we do not have the labels which means I have to create them by my own.


One approach could be getting the highest probability for each topic as the corresponding label.

For example, the labels may be like so:

[2,0,2,1,0,]

However, this is very simple example.

I can also consider two highest probability for each document if each documents only has two topics. So, the example would be like this:

[[ 0.0922935   0  0.81552423]
 [ 0.81396651  0.09409428  0]
 [ 0.05265482  0  0.89494398]
 [ 0.05278187  0.89455775  0]
 [ 0.85209554  0  0.07452064]]

As you can see I have the rule of keeping the same probability for each label if they have the highest probabilities.

Which approach is correct? Has anyone used any other approach that is more meaningful?

Many thanks in advance!

来源:https://stackoverflow.com/questions/45654463/concept-behind-the-transformed-data-of-lda-model

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!