问题
I am attempting to write out an Excel file from an ASP.NET web server using OpenXML. I have about 2100 records and its taking around 20-30 seconds to do this. Any way I can make it faster? Retrieving the 2100 rows from the db takes a fraction of a second. Not sure why manipulating them in memory would take any longer.
Note: ExcelWriter is our custom class, but all its methods are directly from code in this link, http://msdn.microsoft.com/en-us/library/cc861607.aspx
public static MemoryStream CreateThingReport(List<Thing> things, MemoryStream template)
{
SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(template, true);
WorksheetPart workSheetPart = spreadsheet.WorkbookPart.WorksheetParts.First();
SharedStringTablePart sharedStringPart = spreadsheet.WorkbookPart.GetPartsOfType<SharedStringTablePart>().First();
Cell cell = null;
int index = 0;
//create cell formatting for header text
Alignment wrappedAlignment = new Alignment { WrapText = true };
uint rowOffset = 2;
foreach (Thing t in things)
{
//Received Date
cell = ExcelWriter.InsertCellIntoWorksheet("A", rowOffset, workSheetPart);
index = ExcelWriter.InsertSharedStringItem(t.CreateDate.ToShortDateString(), sharedStringPart);
cell.CellValue = new CellValue(index.ToString());
cell.DataType = new DocumentFormat.OpenXml.EnumValue<CellValues>(CellValues.SharedString);
//Car Part Name
cell = ExcelWriter.InsertCellIntoWorksheet("B", rowOffset, workSheetPart);
index = ExcelWriter.InsertSharedStringItem(t.CarPart.Name, sharedStringPart);
cell.CellValue = new CellValue(index.ToString());
cell.DataType = new DocumentFormat.OpenXml.EnumValue<CellValues>(CellValues.SharedString);
rowOffset++;
}
workSheetPart.Worksheet.Save();
spreadsheet.WorkbookPart.Workbook.Save();
spreadsheet.Close();
return template;
回答1:
So it looks like someone in the MSDN community docs ran into similar performance implications. The code below is very inefficient. Someone recommended using a hash table.
For our solution we just removed the insertion of shared strings altogether and went from 1:03 seconds to 0:03 seconds in download time.
//Old: (1:03)
cell = ExcelWriter.InsertCellIntoWorksheet("A", rowOffset, workSheetPart);
index = ExcelWriter.InsertSharedStringItem(thing.CreateDate.ToShortDateString(), sharedStringPart);
cell.CellValue = new CellValue(index.ToString());
cell.DataType = new DocumentFormat.OpenXml.EnumValue<CellValues>(CellValues.SharedString);
//New: (0:03)
cell = ExcelWriter.InsertCellIntoWorksheet("A", rowOffset, workSheetPart);
cell.CellValue = new CellValue(thing.CreateDate.ToShortDateString());
cell.DataType = new DocumentFormat.OpenXml.EnumValue<CellValues>(CellValues.String);
MSDN Docs (slow solution, they should use a Hash Table instead)
private static int InsertSharedStringItem(string text, SharedStringTablePart shareStringPart)
{
// If the part does not contain a SharedStringTable, create one.
if (shareStringPart.SharedStringTable == null)
{
shareStringPart.SharedStringTable = new SharedStringTable();
}
int i = 0;
// Iterate through all the items in the SharedStringTable. If the text already exists, return its index.
foreach (SharedStringItem item in shareStringPart.SharedStringTable.Elements<SharedStringItem>())
{
if (item.InnerText == text)
{
return i;
}
i++;
}
// The text does not exist in the part. Create the SharedStringItem and return its index.
shareStringPart.SharedStringTable.AppendChild(new SharedStringItem(new DocumentFormat.OpenXml.Spreadsheet.Text(text)));
shareStringPart.SharedStringTable.Save();
return i;
}
回答2:
@The Internet
Note that String data type is actually for formulas, for text should use InlineString. See 17.18.11 ST_CellType (Cell Type):
- inlineStr (Inline String) - Cell containing an (inline) rich string, i.e., one not in the shared string table. If this cell type is used, then the cell value is in the is element rather than the v element in the cell (c element).
- str (String) - Cell containing a formula string.
回答3:
The big improment is more Save() function out of loop
//Save data
shareStringPart.SharedStringTable.Save();
worksheetPart.Worksheet.Save();
For 500 records, for me it change from 10 mins to 1 min.
回答4:
@kunjee
If you want performance create all required objects upfront so that are not checked on each invocation of this method. This is why SharedStringTable is passed in as parameter instead of the part.
Dictionaries are for fast, indexed lookup, have better performance than a for loop. Are bit faster than hashtables because are strongly typed so don't require boxing. Being strongly typed is a great benefit anyway.
private static int InsertSharedStringItem(string sharedString, SharedStringTable sharedStringTable, Dictionary<string, int> sharedStrings)
{
int sharedStringIndex;
if (!sharedStrings.TryGetValue(sharedString, out sharedStringIndex))
{
// The text does not exist in the part. Create the SharedStringItem now.
sharedStringTable.AppendChild(new SharedStringItem(new Text(sharedString)));
sharedStringIndex = sharedStrings.Count;
sharedStrings.Add(sharedString, sharedStringIndex);
}
return sharedStringIndex;
}
回答5:
As mentioned by The Internet, they should have used a Hashtable and as proposed by zquanghoangz they should have moved the Save() out of the loop.
InlineString does work, but it gives MS Excel a headache when opening the generated file with uninformative error messages which can be repaired, but still gives an annoying pop-up.
static Cell AddCellWithSharedStringText(
[NotNull]string text,
[NotNull]Hashtable texts,
[NotNull]SharedStringTablePart shareStringPart
)
{
if (!texts.ContainsKey(text))
{
shareStringPart.SharedStringTable.AppendChild(new SharedStringItem(new Text(text)));
texts[text] = texts.Count;
}
var idx = (int)texts[text];
Cell c1 = new Cell();
c1.DataType = CellValues.SharedString;
c1.CellValue = new CellValue(idx.ToString());
return c1;
}
This solution brought the export time down from ~5 minutes to 6 seconds on a [9880 x 66] grid.
来源:https://stackoverflow.com/questions/10542719/net-openxml-performance-issues