How to open a huge excel file efficiently

前端 未结 11 733
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-30 21:29

I have a 150MB one-sheet excel file that takes about 7 minutes to open on a very powerful machine using the following:

# using python
import xlrd
wb = xlrd.open_         


        
11条回答
  •  被撕碎了的回忆
    2021-01-30 21:45

    Looks like it is hardly achievable in Python at all. If we unpack a sheet data file then it would take all required 30 seconds just to pass it through the C-based iterative SAX parser (using lxml, a very fast wrapper over libxml2):

    from __future__ import print_function
    
    from lxml import etree
    import time
    
    
    start_ts = time.time()
    
    for data in etree.iterparse(open('xl/worksheets/sheet1.xml'), events=('start',), 
                                collect_ids=False, resolve_entities=False,
                                huge_tree=True):
        pass
    
    print(time.time() - start_ts)
    

    The sample output: 27.2134890556

    By the way, the Excel itself needs about 40 seconds to load the workbook.

提交回复
热议问题