How to open a huge excel file efficiently

前端未结

关注

 11  751

佛祖请我去吃肉 2021-01-30 21:29

I have a 150MB one-sheet excel file that takes about 7 minutes to open on a very powerful machine using the following:

# using python
import xlrd
wb = xlrd.open_


      
      
        
          11条回答        

        
                    
            
            
                         
                
              
              
                
                   醉酒成梦
                                             
                
                
                (楼主)
            
              
              
                2021-01-30 21:53
              

            
            
                        
I managed to read the file in about 30 seconds using .NET core and the Open XML SDK.

The following example returns a list of objects containing all rows and cells with the matching types, it supports date, numeric and text cells. The project is available here: https://github.com/xferaa/BigSpreadSheetExample/ (Should work on Windows, Linux and Mac OS and does not require Excel or any Excel component to be installed).

public List> ParseSpreadSheet()
{
    List> rows = new List>();

    using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filePath, false))
    {
        WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
        WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();

        OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);

        Dictionary sharedStringCache = new Dictionary();

        int i = 0;
        foreach (var el in workbookPart.SharedStringTablePart.SharedStringTable.ChildElements)
        {
            sharedStringCache.Add(i++, el.InnerText);
        }

        while (reader.Read())
        {
            if(reader.ElementType == typeof(Row))
            {
                reader.ReadFirstChild();

                List