问题
Consider a user that needs a text of docx document without the headers and footers for processing in R.
If a file.docx is renamed as file.zip and the document document.xml is analyzed - it is a well formed XML document with the text.
Did Microsfot (or other developer) publish a schema for this document.xml subfile in the ZIP package of docx file?
The file looks like this:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
- <w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 wp14">
- <w:body>
- <w:p w:rsidR="00F447D7" w:rsidRPr="00C63308" w:rsidRDefault="00F447D7">
- <w:pPr>
回答1:
From wikipedia:
The format was initially standardised by Ecma (as ECMA-376) and, in later versions, by ISO and IEC (as ISO/IEC 29500).
You can find various versions of the XSD in the ECMA-376 downloads
document.xml conforms to the WordprocessingML part of the schemas (look for wml.xsd).
回答2:
I think this might be the location: http://msdn.microsoft.com/en-us/library/hh643329(v=office.12).aspx
This is version 5.2. On this page you can find a link to version 5.1.
来源:https://stackoverflow.com/questions/18660653/where-to-find-the-schema-xsd-file-for-microsoft-docx-format