问题
Pandas has the excellent .read_table()
function, but huge files result in a MemoryError.
Since I only need to load the lines that satisfy a certain condition, I'm looking for a way to only load those.
This could be done using a temporary file:
with open(hugeTdaFile) as huge:
with open(hugeTdaFile + ".partial.tmp", "w") as tmp:
tmp.write(huge.readline()) # the header line
for line in huge:
if SomeCondition(line):
tmp.write(line)
t = pandas.read_table(tmp.name)
Is there a way to avoid such a use of a temp file?
回答1:
you can use the chunksize parameter to return an iterator
see this: http://pandas.pydata.org/pandas-docs/stable/io.html#iterating-through-files-chunk-by-chunk
- filter the chunk frames however you want
- append the filtered to a list
- concat at the end
(alternatively you could write them out to new csvs or HDFStores or whatever)
来源:https://stackoverflow.com/questions/15088190/what-is-the-easiest-way-to-load-a-filtered-tda-file-using-pandas