I have a list of tokenized sentences and would like to fit a tfidf Vectorizer. I tried the following:
tokenized_list_of_sentences = [[\'this\', \'is\', \'one\'],
Try initializing the TfidfVectorizer
object with the parameter lowercase=False
(assuming this is actually desired as you've lowercased your tokens in previous stages).
tokenized_list_of_sentences = [['this', 'is', 'one', 'basketball'], ['this', 'is', 'a', 'football']]
def identity_tokenizer(text):
return text
tfidf = TfidfVectorizer(tokenizer=identity_tokenizer, stop_words='english', lowercase=False)
tfidf.fit_transform(tokenized_list_of_sentences)
Note that I changed the sentences as they apparently only contained stop words which caused another error due to an empty vocabulary.