I have an issue where .doc and .pdf files are coming out OK but a .docx file is coming out corrupt.
In order to solve that I am trying to debug why the .docx is corrup
I used the "Open XML SDK 2.5 Productivity Tool" (http://www.microsoft.com/en-us/download/details.aspx?id=30425) to find a problem with a broken hyperlink reference.
You have to download/install the SDK first, then the tool. The tool will open and analyze the document for problems.
Usually, when there is an error with a particular XML file, Word tells you on which line of which file the error happens. So I believe the problem comes from either the Zipping of the file, either the folder structure.
Here is the folder structure of a Word file:
The .docx
format is a zipped file that contains the following folders:
+--docProps
| + app.xml
| \ core.xml
+ res.log
+--word //this folder contains most of the files that control the content of the document
| + document.xml //Is the actual content of the document
| + endnotes.xml
| + fontTable.xml
| + footer1.xml //Containst the elements in the footer of the document
| + footnotes.xml
| +--media //This folder contains all images embedded in the word
| | \ image1.jpeg
| + settings.xml
| + styles.xml
| + stylesWithEffects.xml
| +--theme
| | \ theme1.xml
| + webSettings.xml
| \--_rels
| \ document.xml.rels //this document tells word where the images are situated
+ [Content_Types].xml
\--_rels
\ .rels
It seems that you have only what is inside the word folder, isn't it ? If this doesn't work, could you please either send the corrupted Docx or post the structure of your folders inside your zip ?
Many years late, but I found this which actually worked for me. (From https://msdn.microsoft.com/en-us/library/office/bb497334.aspx)
(wordDoc is a WordprocessingDocument
)
using DocumentFormat.OpenXml.Validation;
try
{
var validator = new OpenXmlValidator();
var count = 0;
foreach (var error in validator.Validate(wordDoc))
{
count++;
Console.WriteLine("Error " + count);
Console.WriteLine("Description: " + error.Description);
Console.WriteLine("ErrorType: " + error.ErrorType);
Console.WriteLine("Node: " + error.Node);
Console.WriteLine("Path: " + error.Path.XPath);
Console.WriteLine("Part: " + error.Part.Uri);
Console.WriteLine("-------------------------------------------");
}
Console.WriteLine("count={0}", count);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
web docx validator worked for me : http://ucd.eeonline.org/validator/index.php