How to open a huge excel file efficiently

前端 未结 11 750
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-30 21:29

I have a 150MB one-sheet excel file that takes about 7 minutes to open on a very powerful machine using the following:

# using python
import xlrd
wb = xlrd.open_         


        
11条回答
  •  醉酒成梦
    2021-01-30 21:47

    Have you tried loading the worksheet on demand, which available since version 0.7.1 of xlrd?

    To do this you need to pass on_demand=True to open_workbook().

    xlrd.open_workbook(filename=None, logfile=<_io.TextIOWrapper name='' mode='w' encoding='UTF-8'>, verbosity=0, use_mmap=1, file_contents=None, encoding_override=None, formatting_info=False, on_demand=False, ragged_rows=False)


    Other potential python solutions I found for reading an xlsx file:

    • Read the raw xml in 'xl/sharedStrings.xml' and 'xl/worksheets/sheet1.xml'
    • Try the openpyxl library's Read Only mode which claims too be optimized in memory usage for large files.

      from openpyxl import load_workbook wb = load_workbook(filename='large_file.xlsx', read_only=True) ws = wb['big_data']
      
      for row in ws.rows:
          for cell in row:
              print(cell.value)
      
    • If you are running on Windows you could use PyWin32 and 'Excel.Application'

      import time
      import win32com.client as win32
      def excel():
         xl = win32.gencache.EnsureDispatch('Excel.Application')
         ss = xl.Workbooks.Add()
      ...
      

提交回复
热议问题