问题
For a Paragraph object, how can I determine on which page this is located using the Open XML SDK 2.5 ?
I've obtained all child elements in my document and fetched innertext also, using this.
foreach (var i in mainPart.Document.ChildElements.FirstOrDefault().ChildElements)
{
ParagraphElements.Add(i); //openxmlelement list
}
I want to get page number for corresponding paragraph. for example, I have "this is heading 1" marked as style Heading 1 and this will be updated in TOC. so there I need to pass page number
Thanks in advance
回答1:
Pages do not exist in the OpenXML format until they are rendered by a word processor.
The metadata necessary to calculate calculate on which page a given paragraph should appear is available, but it is far from a straightforward operation.
To verify that page numbers do not exist in the raw OpenXML markup:
- Rename a copy of your Word document ending with ".docx" to end with ".zip".
- Within this zip archive, open the sub-directory named "word".
- Within "word" open "document.xml".
This file is contains the XML content of your mainPart.Document
call. The "document.xml" file has a single node, <document>...</document>
, which has in turn a single child node, <body>...</body>
, which in turn holds the content in which you're interested.
When working with OpenXML documents, I find that the abstractions in the OpenXML SDK can sometimes be distracting. Thankfully, its simple to explore the raw markup with LINQ-to-XML. For example, your call to:
var childrenFromOpenXmlSdk = mainPart.Document.ChildElements.Single().ChildElements;
is equivalent to the following in LINQ-to-XML:
IEnumerable<XElement> childrenFromLinqToXml =
XElement.Load("[path]/[file]/word/document.xml")
.Elements()
.Single()
.Elements();`
Inspecting the elements in the childrenFromLinqToXml
you'll find no page number information.
You may see cached page numbers in the raw markup of the TOC itself, but these will be artifacts of the previous rendering, defined by content tags or form fields.
If you need to build up the TOC programmatically, have a look at the following sites:
OfficeOpenXML.com's reference article for TOCs
- This is a helpful reference for the ECMA-376 specification of OpenXML.
Eric White's screencast "Exploring Tables-of-Contents in Open XML WordprocessingML Documents"
- Eric White is a leading authority on all things OpenXML. His
ericwhite.com/blog
is well-worth a look when you find yourself at the intersections of XML markup and on-screen rendering.
- Eric White is a leading authority on all things OpenXML. His
--- Following up on the Sai's comments ---
Hi Austin Drenski, I've created TOC and added all headings programmatically. all I need is page numbers. is there any alternative to get page number of particular paragraph ? I've gone through all the screen casts. But I'm looking for page number alone.
<w:r> <w:fldChar w:fldCharType="begin" /> </w:r> <w:r> <w:instrText xml:space="preserve"> PAGEREF _Toc481680509 \h </w:instrText> </w:r> <w:r> <w:fldChar w:fldCharType="separate" /> </w:r> <w:r> <w:t>2</w:t> </w:r> <w:r> <w:fldChar w:fldCharType="end" /> </w:r>
In that sample XML 2 "2" act as page number. That is hardcoded
now my TOC works perfectly without Pagenumber. where I also analysed default MS word functionality. First time, page numbers are literally given like above.
You can programmatically place a content control <w:sdt>
in the document, as a child of the <w:body>
element.
For a simple TOC with two entries:
<w:sdt>
<w:sdtPr>
<w:id w:val="429708664"/>
<w:docPartObj>
<w:docPartGallery w:val="Table of Contents"/>
<w:docPartUnique/>
</w:docPartObj>
</w:sdtPr>
<w:sdtContent>
<w:p>
<w:pPr>
<w:pStyle w:val="TOCHeading"/>
</w:pPr>
<w:r>
<w:t>Contents</w:t>
</w:r>
</w:p>
<w:p>
<w:pPr>
<w:pStyle w:val="TOC1"/>
<w:tabs>
<w:tab w:val="right" w:leader="dot" w:pos="9350"/>
</w:tabs>
</w:pPr>
<w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
<w:instrText xml:space="preserve"> TOC \o "1-3" \h \z \u </w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="separate"/>
</w:r>
<w:hyperlink w:anchor="_Toc481654079" w:history="1">
<w:r>
<w:rPr>
<w:rStyle w:val="Hyperlink"/>
</w:rPr>
<w:t>Testing 1</w:t>
</w:r>
<w:r>
<w:tab/>
</w:r>
<w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
<w:instrText xml:space="preserve"> PAGEREF _Toc481654079 \h </w:instrText>
</w:r>
<w:r>
</w:r>
<w:r>
<w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r>
<w:t>0</w:t>
</w:r>
<w:r>
<w:fldChar w:fldCharType="end"/>
</w:r>
</w:hyperlink>
</w:p>
<w:p>
<w:pPr>
<w:pStyle w:val="TOC1"/>
<w:tabs>
<w:tab w:val="right" w:leader="dot" w:pos="9350"/>
</w:tabs>
</w:pPr>
<w:hyperlink w:anchor="_Toc481654080" w:history="1">
<w:r>
<w:rPr>
<w:rStyle w:val="Hyperlink"/>
</w:rPr>
<w:t>Testing 2</w:t>
</w:r>
<w:r>
<w:tab/>
</w:r>
<w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
<w:instrText xml:space="preserve"> PAGEREF _Toc481654080 \h </w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r>
<w:t>0</w:t>
</w:r>
<w:r>
<w:fldChar w:fldCharType="end"/>
</w:r>
</w:hyperlink>
</w:p>
<w:p>
<w:r>
<w:fldChar w:fldCharType="end"/>
</w:r>
</w:p>
</w:sdtContent>
</w:sdt>
Note the use of PAGEREF
field codes pointing at bookmarks. Also note the subsequent markup <w:t>0</w:t>
. When the document is opened and the field codes are updated, this zero will be replaced by the page number on which the bookmark is currently rendered.
Each time the document is paginated, the exact placement of a bookmark could change.
Once the zeros are replaced with instance-numbers, you will observe those instance-numbers in the markup. However, these numbers are simply the last rendered values for those field codes.
In the document settings, you can prompt the user to update field codes upon opening, so that the TOC numbers will accurately reflect the current on-screen rendering. To do so, your settings file should resemble:
<w:settings ...namespaces ommitted...>
<w:updateFields w:val="true"/>
...other settings ommitted...
</w:settings>
In the end, you still need to render the OpenXML document with a word processor, but you avoid the complexity of calculating page positions.
回答2:
After a lot of ground work, I found that, page number cannot be retrieved using openxml element. We can approximate it. But we cannot be sure. Because Page numbers are rendered by word processor layout engine. This happens after all the OpenXML elements are passed to word processor. We can calculate it with LastRenderedPageBreak. But we cannot be sure that location of the element is correct.
So, I would suggest to go with UpdateFieldsOnOpen or Macro for an easier solution.
来源:https://stackoverflow.com/questions/43700252/how-to-get-page-numbers-based-on-openxmlelement