问题
TL;DR;
- Using Openpyxl to save changes to a large excel file results in a corrupted xlsx file
- The Excel file is made of several tabs with graphs, formulae, and images, and tables.
- Powershell script can save edits to the xlsx file with no issues.
- I can read the cell values from the excel file with Openpyxl, I am also able to edit & save the xlsx file manually.
- Excel file is unprotected.
- All errors and code snippets have been provided below.
I'm unable to add data to an excel file that one of our teams is using. The excel file is fairly big(at +3MB), has several sheets, contains formulas and graphs and also has images.
Thankfully the sheet I need to enter data to has none of that, however, I found that when I try to save the workbook, I end up with these errors:
Traceback (most recent call last):
File "test.py", line 5, in <module>
wb.save("new.xlsx")
File "C:\Python3\lib\site-packages\openpyxl\workbook\workbook.py", line 392, in save
save_workbook(self, filename)
File "C:\Python3\lib\site-packages\openpyxl\writer\excel.py", line 293, in save_workbook
writer.save()
File "C:\Python3\lib\site-packages\openpyxl\writer\excel.py", line 275, in save
self.write_data()
File "C:\Python3\lib\site-packages\openpyxl\writer\excel.py", line 78, in write_data
self._write_charts()
File "C:\Python3\lib\site-packages\openpyxl\writer\excel.py", line 124, in _write_charts
self._archive.writestr(chart.path[1:], tostring(chart._write()))
File "C:\Python3\lib\site-packages\openpyxl\chart\_chart.py", line 134, in _write
return cs.to_tree()
File "C:\Python3\lib\site-packages\openpyxl\chart\chartspace.py", line 193, in to_tree
tree = super(ChartSpace, self).to_tree()
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 154, in to_tree
node = obj.to_tree(child_tag)
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 154, in to_tree
node = obj.to_tree(child_tag)
File "C:\Python3\lib\site-packages\openpyxl\chart\plotarea.py", line 135, in to_tree
return super(PlotArea, self).to_tree(tagname)
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 146, in to_tree
for node in nodes:
File "C:\Python3\lib\site-packages\openpyxl\descriptors\sequence.py", line 105, in to_tree
el = v.to_tree(namespace=namespace)
File "C:\Python3\lib\site-packages\openpyxl\chart\_chart.py", line 107, in to_tree
return super(ChartBase, self).to_tree(tagname, idx)
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 146, in to_tree
for node in nodes:
File "C:\Python3\lib\site-packages\openpyxl\descriptors\sequence.py", line 39, in to_tree
el = v.to_tree(tagname, idx)
File "C:\Python3\lib\site-packages\openpyxl\chart\series.py", line 170, in to_tree
return super(Series, self).to_tree(tagname)
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 154, in to_tree
node = obj.to_tree(child_tag)
File "C:\Python3\lib\site-packages\openpyxl\descriptors\serialisable.py", line 154, in to_tree
node = obj.to_tree(child_tag)
AttributeError: 'str' object has no attribute 'to_tree'
This is the code I used to perform the "save as" procedure, so never mind about adding data and whatnot, the save action corrupts the file:
from openpyxl import load_workbook
wb=load_workbook("Production Monitoring Script.xlsx")
ws=wb['Prod Perf Script Data']
wb.save("new.xlsx")
I tried an alternative solution with Powershell and it worked.
$xl=New-Object -ComObject Excel.Application
$wb=$xl.WorkBooks.Open('<path here>\Production Monitoring Script.xlsx')
$ws=$wb.WorkSheets.item(1)
$xl.Visible=$true
$ws.Cells.Item(7, 618)=50
$wb.SaveAs('<path here>\New.xlsx')
$xl.Quit()
It was able to save the value "50" in that cell.
回答1:
As discussed in comments on original post: some graphics and other items are not supported by openpyxl, even if they are in worksheets not modified by your code. This is not a full workaround, but works when the unsupported objects are in other worksheets only.
I made an example .xlsx workbook with two worksheets, 'TWC' and 'UV240 Results'. This code assumes that any worksheet whose title ends in 'Results' contains the unsupported images, and creates two temporary files - imageoutput contains the unsupported images, and outputtemp contains the worksheets that may be modified without corruption by openpyxl. Then they're stitched together at the end.
It may be a inefficient in parts; please edit or comment with improvements!
import os
import shutil
import win32com.client
from openpyxl import load_workbook
name = 'spreadsheet.xlsx'
outputfile = 'output.xlsx'
outputtemp = 'outputtemp.xlsx'
shutil.copyfile(name, 'output.xlsx')
wb = load_workbook('output.xlsx')
ws = wb['TWC']
# TWC doesn't have images. Anything ending with 'Results' has unsupported images etc
# Create new file with only openpyxl-unsupported worksheets
imageworksheets = [ws if ws.title.endswith('Results') else '' for ws in wb.worksheets]
if [ws for ws in wb if ws.title != 'TWC']:
imageoutput = 'output2.xlsx'
imagefilewritten = False
while not imagefilewritten:
try:
shutil.copy(name, imageoutput)
except PermissionError as error:
# Catch an exception here - I usually have a GUI function
pass
else:
imagefilewritten = True
excel = win32com.client.Dispatch('Excel.Application')
excel.Visible = False
imagewb = excel.Workbooks.Open(os.path.join(os.getcwd(), imageoutput))
excel.DisplayAlerts = False
for i, ws in enumerate(imageworksheets[::-1]): # Go backwards to avoid reindexing
if not ws:
wsindex = len(imageworksheets) - i
imagewb.Worksheets(wsindex).Delete()
imagefileupdated = False
while not imagefileupdated:
try:
imagewb.Save()
imagewb.Close(SaveChanges = True)
print('Temp image workbook saved.')
except PermissionError as error:
# Catch exception
pass
else:
imagefileupdated = True
# Remove the unsupported worksheets in openpyxl
for ws in wb.worksheets:
if ws in imageworksheets:
wb.remove(ws)
wb.save(outputtemp)
print('Temp output workbook saved.')
''' Do your desired openpyxl manipulations on the remaining supported worksheet '''
# Merge the outputtemp and imageoutput into outputfile
wb1 = excel.Workbooks.Open(os.path.join(os.getcwd(), outputtemp))
wb2 = excel.Workbooks.Open(os.path.join(os.getcwd(), imageoutput))
for ws in wb1.Sheets:
ws.Copy(wb2.Sheets(1))
wb2.SaveAs(os.path.join(os.getcwd(), outputfile))
wb1.Close(SaveChanges = True)
wb2.Close(SaveChanges = True)
print(f'Output workbook saved as {outputfile}.')
excel.Visible = True
excel.DisplayAlerts = True
来源:https://stackoverflow.com/questions/65563297/openpyxl-corrupts-xlsx-on-save-even-when-no-changes-were-made