ValueError: array is too big

匿名 (未验证) 提交于 2019-12-03 01:39:01

问题:

I am trying to merge two excel files using the following code and encountering the error of "ValueError: array is too big; arr.size * arr.dtype.itemsize is larger than the maximum possible size."

import pandas as pd  file1 = pd.read_excel("file1.xlsx") file2 = pd.read_excel("file2.xlsx")  file3 = file1.merge(file2, on="Input E-mail", how="outer")  file3.to_excel("merged1.xlsx") 

File size is ~100MB+~100MB, Available Ram is 9GB (of 16GB)

回答1:

Your resulting dataframe can be much larger than your two input ones. Simple example:

import pandas as pd  values = pd.DataFrame({"id": [1,1,1,1], "value": ["a", "b", "c", "d"]})  users = pd.DataFrame({"id": [1,1,1], "users": ["Amy", "Bob", "Dan"]})  big_table = pd.merge(users, values, how="outer")  print big_table 

Result:

     id  users    value 0     1   Amy       a 1     1   Amy       b 2     1   Amy       c 3     1   Amy       d 4     1   Bob       a 5     1   Bob       b 6     1   Bob       c 7     1   Bob       d 8     1   Dan       a 9     1   Dan       b 10    1   Dan       c 11    1   Dan       d 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!