How can I debug a corrupt docx file?

前端 未结 4 1130
粉色の甜心
粉色の甜心 2021-02-05 07:17

I have an issue where .doc and .pdf files are coming out OK but a .docx file is coming out corrupt.

In order to solve that I am trying to debug why the .docx is corrup

相关标签:
4条回答
  • 2021-02-05 07:42

    I used the "Open XML SDK 2.5 Productivity Tool" (http://www.microsoft.com/en-us/download/details.aspx?id=30425) to find a problem with a broken hyperlink reference.

    You have to download/install the SDK first, then the tool. The tool will open and analyze the document for problems.

    0 讨论(0)
  • 2021-02-05 07:54

    Usually, when there is an error with a particular XML file, Word tells you on which line of which file the error happens. So I believe the problem comes from either the Zipping of the file, either the folder structure.

    Here is the folder structure of a Word file:

    The .docx format is a zipped file that contains the following folders:

    +--docProps
    |  +  app.xml
    |  \  core.xml
    +  res.log
    +--word //this folder contains most of the files that control the content of the document
    |  +  document.xml //Is the actual content of the document
    |  +  endnotes.xml
    |  +  fontTable.xml
    |  +  footer1.xml //Containst the elements in the footer of the document
    |  +  footnotes.xml
    |  +--media //This folder contains all images embedded in the word
    |  |  \  image1.jpeg
    |  +  settings.xml
    |  +  styles.xml
    |  +  stylesWithEffects.xml
    |  +--theme
    |  |  \  theme1.xml
    |  +  webSettings.xml
    |  \--_rels
    |     \  document.xml.rels //this document tells word where the images are situated
    +  [Content_Types].xml
    \--_rels
       \  .rels
    

    It seems that you have only what is inside the word folder, isn't it ? If this doesn't work, could you please either send the corrupted Docx or post the structure of your folders inside your zip ?

    0 讨论(0)
  • 2021-02-05 08:01

    Many years late, but I found this which actually worked for me. (From https://msdn.microsoft.com/en-us/library/office/bb497334.aspx)

    (wordDoc is a WordprocessingDocument)

    using DocumentFormat.OpenXml.Validation;

            try
            {
                var validator = new OpenXmlValidator();
                var count = 0;
                foreach (var error in validator.Validate(wordDoc))
                {
                    count++;
                    Console.WriteLine("Error " + count);
                    Console.WriteLine("Description: " + error.Description);
                    Console.WriteLine("ErrorType: " + error.ErrorType);
                    Console.WriteLine("Node: " + error.Node);
                    Console.WriteLine("Path: " + error.Path.XPath);
                    Console.WriteLine("Part: " + error.Part.Uri);
                    Console.WriteLine("-------------------------------------------");
                }
    
                Console.WriteLine("count={0}", count);
            }
    
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }
    
    0 讨论(0)
  • 2021-02-05 08:06

    web docx validator worked for me : http://ucd.eeonline.org/validator/index.php

    0 讨论(0)
提交回复
热议问题