I would like to create a \"reduced\" version of an Excel (xlsx) spreadsheet (i.e. by removing some rows according to some criterion), and I\'d like to know if this can be do
Internally openpyxl
does not seem to have a concept of 'rows' it works with cells and keeps track of the dimensions and if you use Worksheet.rows it calculates a 2D array of cells from that. You can mutate the array, but it doesn't change the Worksheet.
If you want to do this within the Worksheet, you need to copy the values from the old position to the new position, and set the value of the cells that are no longer needed to ''
or None
and call Worksheet.garbage_collect().
If your dataset is small and of uniform nature (all strings e.g.), you might be better of copying the relevant cell (content) to a new worksheet, remove the old one and set the title of the new one to the title of the just deleted one.
The most elegant thing to do, IMHO, would be to extend Worksheet
or a subclass with a delete_rows
method. I would implement such a method by changing the coordinates of its Cell
s in place. But this could break if openpyxl
internals change.
2018 update: I was searching how to delete a row today and found that the functionality is added in openpyxl 2.5.0-b2. Just tried and it worked perfectly. Here's the link where I found the answer: https://bitbucket.org/openpyxl/openpyxl/issues/964/delete_rows-does-not-work-on-deleting
And here's the syntax to delete one row:
ws.delete_rows(index, 1)
where: 'ws' is the worksheet, 'index' is the row number, and '1' is the number of rows to delete.
There's also the ability to delete columns, but I haven't tried that.