CountVectorizer does not print vocabulary

前端 未结 2 1578
野趣味
野趣味 2021-02-06 01:53

I have installed python 2.7, numpy 1.9.0, scipy 0.15.1 and scikit-learn 0.15.2. Now when I do the following in python:

train_set = (\"The sky is blue.\", \"The          


        
相关标签:
2条回答
  • 2021-02-06 02:18

    You are missing an underscore, try this way:

    from sklearn.feature_extraction.text import CountVectorizer
    train_set = ("The sky is blue.", "The sun is bright.")
    test_set = ("The sun in the sky is bright.", 
        "We can see the shining sun, the bright sun.")
    
    vectorizer = CountVectorizer(stop_words='english')
    document_term_matrix = vectorizer.fit_transform(train_set)
    print vectorizer.vocabulary_
    # {u'blue': 0, u'sun': 3, u'bright': 1, u'sky': 2}
    

    If you use the ipython shell, you can use tab completion, and you can find easier the methods and attributes of objects.

    0 讨论(0)
  • 2021-02-06 02:24

    Try using the vectorizer.get_feature_names() method. It gives the column names in the order it appears in the document_term_matrix.

    from sklearn.feature_extraction.text import CountVectorizer
    train_set = ("The sky is blue.", "The sun is bright.")
    test_set = ("The sun in the sky is bright.", 
        "We can see the shining sun, the bright sun.")
    
    vectorizer = CountVectorizer(stop_words='english')
    document_term_matrix = vectorizer.fit_transform(train_set)
    vectorizer.get_feature_names()
    #> ['blue', 'bright', 'sky', 'sun']
    
    0 讨论(0)
提交回复
热议问题