问题
I have a program which exports a System.Data.DataTable to an XLSX / OpenXml Spreadsheet. Finally have it mostly working. However when opening the Spreadsheet in Excel, Excel complains about the file being invalid, and needing repair, giving this message...
We found a problem with some content in . Do you want us to try to recover as much as we can? If you trust the source of the workbook, clik Yes.
If I click Yes, it comes back with this message...
Clicking the log file and opening that, just shows this...
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>error268360_01.xml</logFileName>
<summary>Errors were detected in file 'C:\Users\aabdi\AppData\Local\Temp\data.20190814.152538.xlsx'</summary>
<repairedRecords>
<repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet1.xml part</repairedRecord>
</repairedRecords>
</recoveryLog>
Obviously, we don't want to deploy this into a production environment like this. So I've been trying to figure out how to fix this. I threw together a quick little sample to validate the XML and show the errors, based on this link from MSDN. But when I run the program and load the exact same XLSX document that Excel complains about, the Validator comes back saying that the file is perfectly Valid. So I'm not sure where else to go from there.
Any better tools for trying to validate my XLSX XML? Following is the complete code I'm using to generate the XLSX file. (Yes, it's in VB.NET, it's a legacy app.)
If I comment out the line in the For Each dr As DataRow
loop, then the XLSX file opens fine in Excel, (just without any data). So it's something with the individual cells, but I'm not really DOING much with them. Setting a value and data type, and that's it.
I also tried replacing the For Each
loop in ConstructDataRow
with the following, but it still outputs the same "bad" XML...
rv.Append(
(From dc In dr.Table.Columns
Select ConstructCell(
NVL(dr(dc.Ordinal), String.Empty),
MapSystemTypeToCellType(dc.DataType)
)
).ToArray()
)
Also tried replacing the call to Append
with AppendChild
for each cell too, but that didn't help either.
The zipped up XLSX file (erroring, with dummy data) is available here:
https://drive.google.com/open?id=1KVVWEqH7VHMxwbRA-Pn807SXHZ32oJWR
Full DataTable to Excel XLSX Code
#Region " ToExcel "
<Extension>
Public Function ToExcel(ByVal target As DataTable) As Attachment
Dim filename = Path.GetTempFileName()
Using doc As SpreadsheetDocument = SpreadsheetDocument.Create(filename, DocumentFormat.OpenXml.SpreadsheetDocumentType.Workbook)
Dim data = New SheetData()
Dim wbp = doc.AddWorkbookPart()
wbp.Workbook = New Workbook()
Dim wsp = wbp.AddNewPart(Of WorksheetPart)()
wsp.Worksheet = New Worksheet(data)
Dim sheets = wbp.Workbook.AppendChild(New Sheets())
Dim sheet = New Sheet() With {.Id = wbp.GetIdOfPart(wsp), .SheetId = 1, .Name = "Data"}
sheets.Append(sheet)
data.AppendChild(ConstructHeaderRow(target))
For Each dr As DataRow In target.Rows
data.AppendChild(ConstructDataRow(dr)) '// THIS LINE YIELDS THE BAD PARTS
Next
wbp.Workbook.Save()
End Using
Dim attachmentname As String = Path.Combine(Path.GetDirectoryName(filename), $"data.{Now.ToString("yyyyMMdd.HHmmss")}.xlsx")
File.Move(filename, attachmentname)
Return New Attachment(attachmentname, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
End Function
Private Function ConstructHeaderRow(dt As DataTable) As Row
Dim rv = New Row()
For Each dc As DataColumn In dt.Columns
rv.Append(ConstructCell(dc.ColumnName, CellValues.String))
Next
Return rv
End Function
Private Function ConstructDataRow(dr As DataRow) As Row
Dim rv = New Row()
For Each dc As DataColumn In dr.Table.Columns
rv.Append(ConstructCell(NVL(dr(dc.Ordinal), String.Empty), MapSystemTypeToCellType(dc.DataType)))
Next
Return rv
End Function
Private Function ConstructCell(value As String, datatype As CellValues) As Cell
Return New Cell() With {
.CellValue = New CellValue(value),
.DataType = datatype
}
End Function
Private Function MapSystemTypeToCellType(t As System.Type) As CellValues
Dim rv As CellValues
Select Case True
Case t Is GetType(String)
rv = CellValues.String
Case t Is GetType(Date)
rv = CellValues.Date
Case t Is GetType(Boolean)
rv = CellValues.Boolean
Case IsNumericType(t)
rv = CellValues.Number
Case Else
rv = CellValues.String
End Select
Return rv
End Function
#End Region
回答1:
For anyone else coming in and finding this, I finally tracked this down to the Cell.DataType
Setting a value of CellValues.Date
will cause Excel to want to "fix" the document.
(apparently for dates, the DataType should be NULL, and Date
was only used in Office 2010).
Also, if you specify a DataType of CellValues.Boolean
, then the CellValue needs to be either 0 or 1. "true" / "false" will also cause Excel to want to "fix" your spreadsheet.
Also, Microsoft has a better validator tool already built for download here:
https://www.microsoft.com/en-us/download/details.aspx?id=30425
来源:https://stackoverflow.com/questions/57502096/xlsx-file-via-openxml-sdk-both-valid-and-invalid