Memory error using openpyxl and large data excels

后端 未结 4 1739
失恋的感觉
失恋的感觉 2021-01-04 23:32

I have written a script which has to read lot of excel files from a folder (around 10,000). This script loads the excel file (some of them has more than 2,000 rows) and read

4条回答
  •  花落未央
    2021-01-04 23:50

    The default implementation of openpyxl will store all the accessed cells into memory. I will suggest you to use the Optimized reader (link - https://openpyxl.readthedocs.org/en/latest/optimized.html) instead

    In code:-

    wb = load_workbook(file_path, use_iterators = True)
    

    While loading a workbook pass use_iterators = True. Then access the sheet and cells like:

    for row in sheet.iter_rows():
        for cell in row:
            cell_text = cell.value
    

    This will reduce the memory footprint to 5-10%

    UPDATE: In version 2.4.0 use_iterators = True option is removed. In newer versions openpyxl.writer.write_only.WriteOnlyWorksheet is introduced for dumping large amounts of data.

    from openpyxl import Workbook
    wb = Workbook(write_only=True)
    ws = wb.create_sheet()
    
    # now we'll fill it with 100 rows x 200 columns
    for irow in range(100):
        ws.append(['%d' % i for i in range(200)])
    
    # save the file
    wb.save('new_big_file.xlsx') 
    

    Not tested the below code just copied from the above link.

    Thanks @SdaliM for the information.

提交回复
热议问题