sklearn and large datasets

前端 未结 4 1385
无人及你
无人及你 2021-01-30 09:11

I have a dataset of 22 GB. I would like to process it on my laptop. Of course I can\'t load it in memory.

I use a lot sklearn but for much smaller datasets.

In

4条回答
  •  夕颜
    夕颜 (楼主)
    2021-01-30 09:37

    You may want to take a look at Dask or Graphlab

    • http://dask.pydata.org/en/latest/

    • https://turi.com/products/create/

    They are similar to pandas but working on large scale data (using out-of-core dataframes). The problem with pandas is all data has to fit into memory.

    Both frameworks can be used with scikit learn. You can load 22 GB of data into Dask or SFrame, then use with sklearn.

提交回复
热议问题