“Large data” work flows using pandas

前端 未结 16 1787
被撕碎了的回忆
被撕碎了的回忆 2020-11-21 07:32

I have tried to puzzle out an answer to this question for many months while learning pandas. I use SAS for my day-to-day work and it is great for it\'s out-of-core support.

16条回答
  •  难免孤独
    2020-11-21 08:09

    One more variation

    Many of the operations done in pandas can also be done as a db query (sql, mongo)

    Using a RDBMS or mongodb allows you to perform some of the aggregations in the DB Query (which is optimized for large data, and uses cache and indexes efficiently)

    Later, you can perform post processing using pandas.

    The advantage of this method is that you gain the DB optimizations for working with large data, while still defining the logic in a high level declarative syntax - and not having to deal with the details of deciding what to do in memory and what to do out of core.

    And although the query language and pandas are different, it's usually not complicated to translate part of the logic from one to another.

提交回复
热议问题