I have a dataset of 22 GB. I would like to process it on my laptop. Of course I can\'t load it in memory.
I use a lot sklearn but for much smaller datasets.
In
You may want to take a look at Dask or Graphlab
http://dask.pydata.org/en/latest/
https://turi.com/products/create/
They are similar to pandas but working on large scale data (using out-of-core dataframes). The problem with pandas is all data has to fit into memory.
Both frameworks can be used with scikit learn. You can load 22 GB of data into Dask or SFrame, then use with sklearn.