Memory-efficient filtering of `DataFrame` rows

后端 未结 2 683
时光取名叫无心
时光取名叫无心 2021-01-23 00:22

I have a large DataFrame object (1,440,000,000 rows). I operate at memory (swap includet) limit.

I need to extract a subset of the rows with certain value o

相关标签:
2条回答
  • 2021-01-23 00:57

    If by any change all the data in the DataFrame are of same types, use numpy array instead, it's more memory efficient and faster. You can convert your dataframe to numpy matrix by df.as_matrix().

    Also that you might wanna check how much memory the dataframe already takes by:

        import sys
        sys.getsizeof()
    

    that returns the size in bytes.

    0 讨论(0)
  • 2021-01-23 01:06

    Use query, it should be a bit faster:

    df = df.query("field == value")
    
    0 讨论(0)
提交回复
热议问题