I have a 150MB one-sheet excel file that takes about 7 minutes to open on a very powerful machine using the following:
# using python
import xlrd
wb = xlrd.open_
Have you tried loading the worksheet on demand, which available since version 0.7.1 of xlrd?
To do this you need to pass on_demand=True
to open_workbook().
xlrd.open_workbook(filename=None, logfile=<_io.TextIOWrapper name='' mode='w' encoding='UTF-8'>, verbosity=0, use_mmap=1, file_contents=None, encoding_override=None, formatting_info=False, on_demand=False, ragged_rows=False)
Other potential python solutions I found for reading an xlsx file:
Try the openpyxl library's Read Only mode which claims too be optimized in memory usage for large files.
from openpyxl import load_workbook wb = load_workbook(filename='large_file.xlsx', read_only=True) ws = wb['big_data']
for row in ws.rows:
for cell in row:
print(cell.value)
If you are running on Windows you could use PyWin32 and 'Excel.Application'
import time
import win32com.client as win32
def excel():
xl = win32.gencache.EnsureDispatch('Excel.Application')
ss = xl.Workbooks.Add()
...