I instantiated a sklearn.feature_extraction.text.CountVectorizer object by passing a vocabulary through the vocabulary
argument, but I get a sklearn.utils.val
For some reason, even though you passed vocabulary=vocabulary_to_load
as argument for sklearn.feature_extraction.text.CountVectorizer()
, you still need to call loaded_vectorizer._validate_vocabulary()
before being able to call loaded_vectorizer.get_feature_names()
.
In your example, you should therefore do the following when creating an CountVectorizer object with your vocabulary:
vocabulary_to_load = pickle.load(open(dictionary_filepath, 'r'))
loaded_vectorizer = sklearn.feature_extraction.text.CountVectorizer(ngram_range=(ngram_size,
ngram_size), min_df=1, vocabulary=vocabulary_to_load)
loaded_vectorizer._validate_vocabulary()
print('loaded_vectorizer.get_feature_names(): {0}'.
format(loaded_vectorizer.get_feature_names()))