问题
I have code like this
x_train=data['TOKEN'].loc[:2]
y=data['label'].loc[:2]
x_test=data['TOKEN'].loc[3:]
that contain 3 Data training 1 class each class(-1),(0),(1) and 1 data test
#TFIDF training
tfidf= TfidfVectorizer(smooth_idf=False,norm=None)
x_tfidf2 = tfidf.fit_transform(x_train)
tfidfframe_train = pd.DataFrame(x_tfidf_train,columns=tfidf.get_feature_names())
#the output of tfidfframe_train
a b c d e f
0 0.0 0.0 0.0 1.477 1.477 1.0 -> class -1 data train doc1
1 0.0 0.0 1.176 0.0 0.0 1.0 -> class 0 data train doc2
2 1.477 1.477 1.176 0.0 0.0 1.0 -> class 1 data train doc3
#TFIDF testing
x_tfidf3 = tfidf.transform(x_test)
tfidfframe_test = pd.DataFrame(x_tfidf_test,columns=tfidf.get_feature_names())
a b c d e f
0 0.0 0.0 1.17 0.0 0.0 1.0
so now we know that we have word c and f in our data test i fit the data to MultinomialNB
from sklearn.naive_bayes import MultinomialNB
model =MultinomialNB(alpha=1.0)
classifier = model.fit(x_tfidf_chi2_train,y)
print ('class log prrior \n',classifier.class_log_prior_)
#output (logbase10)
class log prrior #(logbase10 1/3) = -0.47712125 this output is correct
[-0.47712125 -0.47712125 -0.47712125]
print('Conditional Probabilities :\n',classifier.feature_log_prob_) # count Conditional Prob with P(w|c)
#output #this output actually correct. this count by input the TFIDF values above in data train to logbase10 of P(w|c) calculation
a b c d e f
[[-0.99800822 -0.99800822 -0.99800822 -0.60406095 -0.60406095 -0.69697822] -> class -1 data train doc1
[-0.91254573 -0.91254573 -0.57486863 -0.91254573 -0.91254573 -0.61151573] -> class 0 data train doc2
[-0.65256092 -0.65256092 -0.70883108 -1.04650819 -1.04650819 -0.74547819]] -> class 1 data train doc3
Now the problem is when i try to calculate the Class Maximum log Posterior of the test data its should be P(c) + P(w|c) in sklearn known by _joint_log_likelihood
so we can manual calculating that by predicting word [c f]
c e logbase10P(c)
-0.99800822 + -0.69697822 + -0.47712125 = -2.17210769 -> class -1
-0.57486863 + -0.61151573 + -0.47712125 = -1.66350558 -> class 0
-0.70883108 + -0.74547819 + -0.47712125 = -1.92552177 -> -> class 1
but when i try to output it by the system the output was not match
jll = classifier._joint_log_likelihood(x_test)
output sorted left to right (-1,0,1)
class -1 class 0 class 1
[[-2.34784822 -1.76473496 -2.05624949]]
what is wrong in the MultinomialNB? of _joint_log_likelihood? on naive_bayes.py documentaion of MultinomialNB said the code
def _joint_log_likelihood(self, X):
"""Calculate the posterior log probability of the samples X"""
return (safe_sparse_dot(X, self.feature_log_prob_.T) +
self.class_log_prior_)
Maybe you can do review and tell me this is the data Data HOPE you guys can answer it
来源:https://stackoverflow.com/questions/63184256/joint-log-likelihood-give-me-wrong-values