I have a 150MB one-sheet excel file that takes about 7 minutes to open on a very powerful machine using the following:
# using python
import xlrd
wb = xlrd.open_
Looks like it is hardly achievable in Python at all. If we unpack a sheet data file then it would take all required 30 seconds just to pass it through the C-based iterative SAX parser (using lxml
, a very fast wrapper over libxml2
):
from __future__ import print_function
from lxml import etree
import time
start_ts = time.time()
for data in etree.iterparse(open('xl/worksheets/sheet1.xml'), events=('start',),
collect_ids=False, resolve_entities=False,
huge_tree=True):
pass
print(time.time() - start_ts)
The sample output: 27.2134890556
By the way, the Excel itself needs about 40 seconds to load the workbook.