I have installed python 2.7, numpy 1.9.0, scipy 0.15.1 and scikit-learn 0.15.2. Now when I do the following in python:
train_set = (\"The sky is blue.\", \"The
Try using the vectorizer.get_feature_names()
method. It gives the column names in the order it appears in the document_term_matrix
.
from sklearn.feature_extraction.text import CountVectorizer
train_set = ("The sky is blue.", "The sun is bright.")
test_set = ("The sun in the sky is bright.",
"We can see the shining sun, the bright sun.")
vectorizer = CountVectorizer(stop_words='english')
document_term_matrix = vectorizer.fit_transform(train_set)
vectorizer.get_feature_names()
#> ['blue', 'bright', 'sky', 'sun']