问题
How do I identify a new page, or some identifier that denotes a pages number using python-docx? I've looked through the docs to no avail so far and have also tried looking for the WD_BREAK.PAGE attribute but this feature is not yet support. All help is appreciated thanks.
回答1:
The short answer is that you can't reliably determine soft page breaks from a .docx file. You can identify hard page breaks and you may be able to detect where Word broke pages the last time it "flowed" the document.
A Word document is a "flowed" document, meaning that Word's layout engine "flows" the text of the document into a page until it runs out of room, then creates a new page into which it flows the remaining text. These "soft" page breaks are not specified in the .docx file; they are determined by Word at the time of rendering, either for display or printing. This makes sense because whenever you change, for example, the margins, the pages may break at different locations.
An implication of this is that the .docx file does not contain markup identifying where the following text should flow onto a new page.
A hard page break is one explicitly inserted by the document author to cause following content to flow to a new page without regard to whether the current page is full. These are implemnted using a break element, within a run I believe, and can be detected.
As an aid to assistive technologies, like a voice reader for the visually impaired, Word may insert <w:lastRenderedPageBreak>
elements. I don't know much about these and under what circumstances Word inserts these, but it might be an avenue worth exploring.
来源:https://stackoverflow.com/questions/23980268/find-a-new-page-in-a-word-document