I would like to change the following phrases to vectors with sklearn:
Article 1. It is not good to eat pizza after midnight
Article 2. I wouldn\'t survive a
Look at the docs. It says CountVectorizer.fit_transform
expects an iterable of strings (e.g. a list of strings). You are passing a single string instead.
It makes sense, fit_transform in scikit does two things: 1) it learns a model (fit) 2) it applies the model on the data (transform). You want to build a matrix, where columns are all the words in the vocabulary and rows correspond to the documents. For that you need to know the whole vocabulary in your corpus (all the columns).