I have a large DataFrame
object (1,440,000,000 rows). I operate at memory (swap includet) limit.
I need to extract a subset of the rows with certain value o
If by any change all the data in the DataFrame are of same types, use numpy array instead, it's more memory efficient and faster. You can convert your dataframe to numpy matrix by df.as_matrix().
Also that you might wanna check how much memory the dataframe already takes by:
import sys
sys.getsizeof()
that returns the size in bytes.
Use query, it should be a bit faster:
df = df.query("field == value")