I have written a script which has to read lot of excel files from a folder (around 10,000). This script loads the excel file (some of them has more than 2,000 rows) and read
The default implementation of openpyxl will store all the accessed cells into memory. I will suggest you to use the Optimized reader (link - https://openpyxl.readthedocs.org/en/latest/optimized.html) instead
In code:-
wb = load_workbook(file_path, use_iterators = True)
While loading a workbook pass use_iterators = True
. Then access the sheet and cells like:
for row in sheet.iter_rows():
for cell in row:
cell_text = cell.value
This will reduce the memory footprint to 5-10%
UPDATE: In version 2.4.0 use_iterators = True
option is removed. In newer versions openpyxl.writer.write_only.WriteOnlyWorksheet
is introduced for dumping large amounts of data.
from openpyxl import Workbook
wb = Workbook(write_only=True)
ws = wb.create_sheet()
# now we'll fill it with 100 rows x 200 columns
for irow in range(100):
ws.append(['%d' % i for i in range(200)])
# save the file
wb.save('new_big_file.xlsx')
Not tested the below code just copied from the above link.
Thanks @SdaliM for the information.