OpenXML does not help to read large Excel files contrary to documentation

前端 未结 1 1419
你的背包
你的背包 2021-01-23 16:57

The documentation says that:

The following code segment is used to read a very large Excel 
file using the DOM approach.

and then goes an examp

相关标签:
1条回答
  • 2021-01-23 17:28

    You seem to have a few questions, I'll try and tackle them one-by-one.

    So, I need to know whether OpenXML really helps to read large files. And, if not, what are the alternatives (Interop does not help - I've already checked it).

    Yes, the OpenXml SDK is great for reading large files but you may need to use a SAX approach rather than a DOM approach. From the same documentation you cite:

    However, the DOM approach requires loading entire Open XML parts into memory, which can cause an Out of Memory exception when you are working with really large files.... Consider using SAX when you need to handle very large files.

    The DOM approach loads the whole sheet into memory which for a large sheet can cause out of memory exceptions. Using the SAX approach you read each element in turn which reduces the memory consumption considerably.

    So, my extra question is how to parse only rows with data using SAX approach

    You are only getting the rows that have data (or at least the rows that exist in the XML) using the SDK. You appear to have asked this as a separate question which I've answered in more detail but essentially you are seeing the start and end of each row element using the code in your question. See my answer to your Why does OpenXML read rows twice question for more details.

    So, my final extra question is how to get the sheet you want.

    You need to find the Sheet by name which is a descendant of the Workbook. Once you have that you can use its Id to get the WorksheetPart:

    using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filename, false))
    {
        WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
    
        Sheet sheet = workbookPart.Workbook.Descendants<Sheet>().Where(s => s.Name == sheetName).First();
        if (sheet != null)
        {
            WorksheetPart worksheetPart = workbookPart.GetPartById(sheet.Id) as WorksheetPart;
    
            //read worksheetPart...
        }
    }
    
    0 讨论(0)
提交回复
热议问题