Memory error using openpyxl and large data excels

后端未结

关注

 4  1739

失恋的感觉 2021-01-04 23:32

I have written a script which has to read lot of excel files from a folder (around 10,000). This script loads the excel file (some of them has more than 2,000 rows) and read

4条回答

花落未央 (楼主)

2021-01-04 23:50
The default implementation of openpyxl will store all the accessed cells into memory. I will suggest you to use the Optimized reader (link - https://openpyxl.readthedocs.org/en/latest/optimized.html) instead

In code:-
```
wb = load_workbook(file_path, use_iterators = True)
```
While loading a workbook pass use_iterators = True. Then access the sheet and cells like:
```
for row in sheet.iter_rows():
    for cell in row:
        cell_text = cell.value
```
This will reduce the memory footprint to 5-10%

UPDATE: In version 2.4.0 use_iterators = True option is removed. In newer versions openpyxl.writer.write_only.WriteOnlyWorksheet is introduced for dumping large amounts of data.
```
from openpyxl import Workbook
wb = Workbook(write_only=True)
ws = wb.create_sheet()

# now we'll fill it with 100 rows x 200 columns
for irow in range(100):
    ws.append(['%d' % i for i in range(200)])

# save the file
wb.save('new_big_file.xlsx') 
```
Not tested the below code just copied from the above link.

Thanks @SdaliM for the information.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...