可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer() vectorizer = vectorizer.fit(word_data) freq_term_mat = vectorizer.transform(word_data) from sklearn.feature_extraction.text import TfidfTransformer tfidf = TfidfTransformer(norm="l2") tfidf = tfidf.fit(freq_term_mat) Ttf_idf_matrix = tfidf.transform(freq_term_mat) voc_words = Ttf_idf_matrix.getfeature_names() print "The num of words = ",len(voc_words)
when I run the program containing this piece of code I get following error:
Traceback (most recent call last): File "vectorize_text.py", line 87, in
voc_words = Ttf_idf_matrix.getfeature_names()
File "/home/farheen/anaconda/lib/python2.7/site- >packages/scipy/sparse/base.py", line 499, in getattr
raise AttributeError(attr + " not found")
AttributeError: get_feature_names not found
Please suggest me a solution for it.
回答1:
I see two problems with your code. First, you are applying get_feature_names() to your matrix output, rather than to the vectorizer. You need to apply it to the vectorizer. Second, you are unnecessarily breaking this apart into too many steps. You can use TfidfVectorizer.fit_transform() to do what you want in much less space. Try this:
from sklearn.feature_extraction.text import TfidfVectorizer vectorizer = TfidfVectorizer() transformed = vectorizer.fit_transform(word_data) print "Num words:", len(vectorizer.get_feature_names())
回答2:
Is it not get_feature_names()
, ie. with an underscore after 'get'.
Also, I am not sure what you are trying to do, but get_feature_names is a method valid only for *Vectorizer classes, not with the TfidTransformer. Maybe you want TfidVectorizer instead?
回答3:
from sklearn.feature_extraction.text import TfidfVectorizer TfIdfer = TfidfVectorizer(stop_words = 'english') TfIdfer.fit_transform(word_data).toarray() names = TfIdfer.get_feature_names()