XLSX file via OpenXml SDK Both Valid and Invalid

不打扰是莪最后的温柔 提交于 2019-12-24 19:59:29

问题


I have a program which exports a System.Data.DataTable to an XLSX / OpenXml Spreadsheet. Finally have it mostly working. However when opening the Spreadsheet in Excel, Excel complains about the file being invalid, and needing repair, giving this message...

We found a problem with some content in . Do you want us to try to recover as much as we can? If you trust the source of the workbook, clik Yes.

If I click Yes, it comes back with this message...

Clicking the log file and opening that, just shows this...

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
        <logFileName>error268360_01.xml</logFileName>
        <summary>Errors were detected in file 'C:\Users\aabdi\AppData\Local\Temp\data.20190814.152538.xlsx'</summary>
        <repairedRecords>
            <repairedRecord>Repaired Records: Cell information from /xl/worksheets/sheet1.xml part</repairedRecord>
        </repairedRecords>
    </recoveryLog> 

Obviously, we don't want to deploy this into a production environment like this. So I've been trying to figure out how to fix this. I threw together a quick little sample to validate the XML and show the errors, based on this link from MSDN. But when I run the program and load the exact same XLSX document that Excel complains about, the Validator comes back saying that the file is perfectly Valid. So I'm not sure where else to go from there.

Any better tools for trying to validate my XLSX XML? Following is the complete code I'm using to generate the XLSX file. (Yes, it's in VB.NET, it's a legacy app.)

If I comment out the line in the For Each dr As DataRow loop, then the XLSX file opens fine in Excel, (just without any data). So it's something with the individual cells, but I'm not really DOING much with them. Setting a value and data type, and that's it.

I also tried replacing the For Each loop in ConstructDataRow with the following, but it still outputs the same "bad" XML...

        rv.Append(
            (From dc In dr.Table.Columns
             Select ConstructCell(
                 NVL(dr(dc.Ordinal), String.Empty),
                 MapSystemTypeToCellType(dc.DataType)
             )
            ).ToArray()
        )

Also tried replacing the call to Append with AppendChild for each cell too, but that didn't help either.

The zipped up XLSX file (erroring, with dummy data) is available here:
https://drive.google.com/open?id=1KVVWEqH7VHMxwbRA-Pn807SXHZ32oJWR

Full DataTable to Excel XLSX Code


    #Region " ToExcel "
    <Extension>
    Public Function ToExcel(ByVal target As DataTable) As Attachment
        Dim filename = Path.GetTempFileName()
        Using doc As SpreadsheetDocument = SpreadsheetDocument.Create(filename, DocumentFormat.OpenXml.SpreadsheetDocumentType.Workbook)
            Dim data = New SheetData()

            Dim wbp = doc.AddWorkbookPart()
            wbp.Workbook = New Workbook()
            Dim wsp = wbp.AddNewPart(Of WorksheetPart)()
            wsp.Worksheet = New Worksheet(data)

            Dim sheets = wbp.Workbook.AppendChild(New Sheets())
            Dim sheet = New Sheet() With {.Id = wbp.GetIdOfPart(wsp), .SheetId = 1, .Name = "Data"}
            sheets.Append(sheet)

            data.AppendChild(ConstructHeaderRow(target))
            For Each dr As DataRow In target.Rows
                data.AppendChild(ConstructDataRow(dr)) '// THIS LINE YIELDS THE BAD PARTS
            Next

            wbp.Workbook.Save()
        End Using

        Dim attachmentname As String = Path.Combine(Path.GetDirectoryName(filename), $"data.{Now.ToString("yyyyMMdd.HHmmss")}.xlsx")
        File.Move(filename, attachmentname)
        Return New Attachment(attachmentname, "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet")
    End Function

    Private Function ConstructHeaderRow(dt As DataTable) As Row
        Dim rv = New Row()
        For Each dc As DataColumn In dt.Columns
            rv.Append(ConstructCell(dc.ColumnName, CellValues.String))
        Next
        Return rv
    End Function

    Private Function ConstructDataRow(dr As DataRow) As Row
        Dim rv = New Row()
        For Each dc As DataColumn In dr.Table.Columns
            rv.Append(ConstructCell(NVL(dr(dc.Ordinal), String.Empty), MapSystemTypeToCellType(dc.DataType)))
        Next
        Return rv
    End Function

    Private Function ConstructCell(value As String, datatype As CellValues) As Cell
        Return New Cell() With {
        .CellValue = New CellValue(value),
        .DataType = datatype
        }
    End Function

    Private Function MapSystemTypeToCellType(t As System.Type) As CellValues
        Dim rv As CellValues
        Select Case True
            Case t Is GetType(String)
                rv = CellValues.String
            Case t Is GetType(Date)
                rv = CellValues.Date
            Case t Is GetType(Boolean)
                rv = CellValues.Boolean
            Case IsNumericType(t)
                rv = CellValues.Number
            Case Else
                rv = CellValues.String
        End Select

        Return rv
    End Function
    #End Region

回答1:


For anyone else coming in and finding this, I finally tracked this down to the Cell.DataType

Setting a value of CellValues.Date will cause Excel to want to "fix" the document. (apparently for dates, the DataType should be NULL, and Date was only used in Office 2010).

Also, if you specify a DataType of CellValues.Boolean, then the CellValue needs to be either 0 or 1. "true" / "false" will also cause Excel to want to "fix" your spreadsheet.

Also, Microsoft has a better validator tool already built for download here:
https://www.microsoft.com/en-us/download/details.aspx?id=30425



来源:https://stackoverflow.com/questions/57502096/xlsx-file-via-openxml-sdk-both-valid-and-invalid

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!