问题
I'm reading an Excel file and looping through the rows, deleting those that meet a condition
using (var wb = new XLWorkbook(path))
{
var ws = wb.Worksheet(sheet);
int deleted = 0;
for (int row_i = 2; row_i <= ws.LastRowUsed().RowNumber(); row_i++)
{
ExcelRow row = new ExcelRow(ws.Row(row_i-deleted));
row.styleCol = header.styleCol;
K key = keyReader(row);
if (!writeData(row,dict[key])) deleted++;
}
wb.Save();
}
The code is very slow for a file with thousands of rows, even without deletions, or when hundreds of rows must be deleted.
回答1:
First, please read the speed rant: https://ericlippert.com/2012/12/17/performance-rant/
As for optimisation potential:
The bottleneck should be the Disk. Unless you got something like a RAID 0 of SSD's or some serious computation in keyReader
or those dictionaries , there is no way the CPU will be a relevant factor. So the most important thing is to never retreive the same value twice.
If you want to eliminate the compuatation time, you could implement some defered background loading of the next column. You should be easily able to replace direct access with a Enumerator. This will reduce the execution time basically down to Disk speed.
回答2:
There are 2 important optimizations you have to do. The first is quite trivial, but has a great impact: you need to store the last row, because the function to get it is time expensive, more than you could expect.
int lastrow = ws.LastRowUsed().RowNumber();
for (int row_i = 2; row_i <= lastrow; row_i++)
The second is a bit more involved and it is related to the multiple (and slow) row/cell shifts (XLShiftDeletedCells.ShiftCellsUp
) when you don't delete a single range. In that case I can suggest a workaround. Do not delete the single row during your writeData
- notice that therefore you won't decrement
ExcelRow row = new ExcelRow(ws.Row(row_i)); // no deletion in the loop
your loop index - but momentarily add a column (temp_col
) to mark the rows as "ok
" or "skip
" and eventually sort it, so that you can delete all the rows in a single range.
if (deleted > 0)
{
int lastcol = ws.LastColumnUsed().ColumnNumber();
var tab = ws.Range(ws.Cell(2, 1), ws.Cell(lastrow, lastcol));
tab.Sort(temp_col);
tab = ws.Range(ws.Cell(lastrow - deleted + 1, 1), ws.Cell(lastrow, lastcol));
tab.Delete(XLShiftDeletedCells.ShiftCellsUp);
}
ws.Column(temp_col).Delete();
Performance Test
No need to add anything about the first point. The second is original of this answer and I can confirm that, by measuring the elapsed time with a Stopwatch
, the observed reduction of the execution time is more than 80% in my situation (from 200 to 27 seconds).
来源:https://stackoverflow.com/questions/61749972/optimizing-performance-for-closedxml-loops-and-row-deletion