Reading large text files with Pandas [duplicate]

孤者浪人 提交于 2019-12-21 05:07:09

问题


I have been trying to read a few large text files (sizes around 1.4GB - 2GB) with Pandas, using the read_csv function, with no avail. Below are the versions I am using:

  • Python 2.7.6
  • Anaconda 1.9.2 (64-bit) (default, Nov 11 2013, 10:49:15) [MSC v.1500 64 bit (AMD64)]
  • IPython 1.1.0
  • Pandas 0.13.1

I tried the following:

df = pd.read_csv(data.txt')

and it crashed Ipython with a message: Kernel died, restarting.

Then I tried using an iterator:

tp = pd.read_csv('data.txt', iterator = True, chunksize=1000)

again, I got the Kernel died, restarting error.

Any ideas? Or any other way to read big text files?

Thank you!


回答1:


A solution for a similar question was given here some time after the posting of this question. Basically, it suggests to read the file in chunks by doing the following:

chunksize = 10 ** 6  # number of rows per chunk
for chunk in pd.read_csv(filename, chunksize=chunksize):
    process(chunk)

You should specify the chunksize parameter accordingly to your machine's capabilities (that is, make sure it can process the chunk).



来源:https://stackoverflow.com/questions/23411619/reading-large-text-files-with-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!