Python: Create a “Table Of Contents” with python-docx/lxml

强颜欢笑 提交于 2019-11-27 16:49:32

问题


I'm trying to automate the creation of .docx files (WordML) with the help of python-docx (https://github.com/mikemaccana/python-docx). My current script creates the ToC manually with following loop:

for chapter in myChapters:
    body.append(paragraph(chapter.text, style='ListNumber'))

Does anyone know of a way to use the "word built-in" ToC-function, which adds the index automatically and also creates paragraph-links to the individual chapters?

Thanks a lot!


回答1:


The key challenge is that a rendered ToC depends on pagination to know what page number to put for each heading. Pagination is a function provided by the layout engine, a very complex piece of software built into the Word client. Writing a page layout engine in Python is probably not a good idea, definitely not a project I'm planning to undertake anytime soon :)

The ToC is composed of two parts:

  1. the element that specifies the ToC placement and things like which heading levels to include.
  2. the actual visible ToC content, headings and page numbers with dotted lines connecting them.

Creating the element is pretty straightforward and relatively low-effort. Creating the actual visible content, at least if you want the page numbers included, requires the Word layout engine.

These are the options:

  1. Just add the tag and a few other bits to signal Word the ToC needs to be updated. When the document is first opened, a dialog box appears saying links need to be refreshed. The user clicks Yes and Bob's your uncle. If the user clicks No, the ToC title appears with no content below it and the ToC can be updated manually.

  2. Add the tag and then engage a Word client, by means of C# or Visual Basic against the Word Automation library, to open and save the file; all the fields (including the ToC field) get updated.

  3. Do the same thing server-side if you have a SharePoint instance or whatever that can do it with Word Automation Services.

  4. Create an AutoOpen macro in the document that automatically runs the field update when the document is opened. Probably won't pass a lot of virus checkers and won't work on locked-down Windows builds common in a corporate setting.

Here's a very nice set of screencasts by Eric White that explain all the hairy details




回答2:


Sorry for adding comments to an old post, but I think it may be helpful. This is not my solution, but it has been found there: https://github.com/python-openxml/python-docx/issues/36 Thanks to https://github.com/mustash and https://github.com/scanny

    from docx.oxml.ns import qn
    from docx.oxml import OxmlElement

    paragraph = self.document.add_paragraph()
    run = paragraph.add_run()
    fldChar = OxmlElement('w:fldChar')  # creates a new element
    fldChar.set(qn('w:fldCharType'), 'begin')  # sets attribute on element
    instrText = OxmlElement('w:instrText')
    instrText.set(qn('xml:space'), 'preserve')  # sets attribute on element
    instrText.text = 'TOC \\o "1-3" \\h \\z \\u'   # change 1-3 depending on heading levels you need

    fldChar2 = OxmlElement('w:fldChar')
    fldChar2.set(qn('w:fldCharType'), 'separate')
    fldChar3 = OxmlElement('w:t')
    fldChar3.text = "Right-click to update field."
    fldChar2.append(fldChar3)

    fldChar4 = OxmlElement('w:fldChar')
    fldChar4.set(qn('w:fldCharType'), 'end')

    r_element = run._r
    r_element.append(fldChar)
    r_element.append(instrText)
    r_element.append(fldChar2)
    r_element.append(fldChar4)
    p_element = paragraph._p



回答3:


@Mawg // Updating ToC

Had the same issue to update the ToC and googled for it. Not my code, but it works:

word = win32com.client.DispatchEx("Word.Application")
doc = word.Documents.Open(input_file_name)
doc.TablesOfContents(1).Update()
doc.Close(SaveChanges=True)
word.Quit()


来源:https://stackoverflow.com/questions/18595864/python-create-a-table-of-contents-with-python-docx-lxml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!