Creating a TfidfVectorizer over a text column of huge pandas dataframe
I need to get matrix of TF-IDF features from the text stored in columns of a huge dataframe , loaded from a CSV file (which cannot fit in memory). I am trying to iterate over dataframe using chunks but it is returning generator objects which is not an expected variable type for the method TfidfVectorizer . I guess I am doing something wrong while writing a generator method ChunkIterator shown below. import pandas as pd from sklearn.feature_extraction.text import TfidfVectorizer #Will work only for small Dataset csvfilename = 'data_elements.csv' df = pd.read_csv(csvfilename) vectorizer =