How can I insert a checkbox form into a .docx file using python-docx?

ⅰ亾dé卋堺 提交于 2019-11-29 17:30:40
Crudough

I've finally been able to accomplish this after lots of digging and help from @scanny.

Checkboxes can be inserted into any paragraph in python-docx using the following function. I am inserting a checkbox into specific cells in a table.

def addCheckbox(para, box_id, name):

run = para.add_run()
tag = run._r
fld = docx.oxml.shared.OxmlElement('w:fldChar')
fld.set(docx.oxml.ns.qn('w:fldCharType'), 'begin')
fldData = docx.oxml.shared.OxmlElement('w:fldData')

fldData.text = '/////2UAAAAUAAYAQwBoAGUAYwBrADEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'
fldData.set(docx.oxml.ns.qn('xml:space'), 'preserve')
fld.append(fldData)
tag.append(fld)

run2 = para.add_run()
tag2 = run2._r
start = docx.oxml.shared.OxmlElement('w:bookmarkStart')
start.set(docx.oxml.ns.qn('w:id'), str(box_id))
start.set(docx.oxml.ns.qn('w:name'), name)
tag2.append(start)

run3 = para.add_run()
tag3 = run3._r
instr = docx.oxml.OxmlElement('w:instrText')
instr.text = 'FORMCHECKBOX'
tag3.append(instr)

run4 = para.add_run()
tag4 = run4._r
fld2 = docx.oxml.shared.OxmlElement('w:fldChar')
fld2.set(docx.oxml.ns.qn('w:fldCharType'), 'end')
tag4.append(fld2)

run5 = para.add_run()
tag5 = run5._r
end = docx.oxml.shared.OxmlElement('w:bookmarkEnd')
end.set(docx.oxml.ns.qn('w:id'), str(box_id))
end.set(docx.oxml.ns.qn('w:name'), name)
tag5.append(end)

return

The fldData.text object seems random but was taken from the generated XML form a word document with an existing checkbox. The function fails without setting this text. I have not confirmed but I have heard of one scenario where a developer was arbitrarily changing the string but once saved it would revert back to the original generated value.

The key thing with these workaround functions is to have an example of XML that works, and to be able to compare the XML you generate. If you generate XML that matches the working example, it will work every time. opc-diag is handy for inspecting the XML in a Word document. Working with really small documents (like single paragraph or two-row table, for analysis purposes) makes it a lot easier to work out how Word is structuring the XML.

An important thing to note is that the XML elements in a Word document are sequence sensitive, meaning the child elements within any other element generally have a set order in which they must appear. If you get this swapped around, you get the "repair" error you mentioned.

I find it much easier to manipulate the XML from within python-docx, as it takes care of all the unzipping and rezipping for you, along with a lot of the other details.

To get the sequencing right, you'll need to be familiar with the XML Schema specifications for the elements you're working with. There is an example here: http://python-docx.readthedocs.io/en/latest/dev/analysis/features/text/paragraph-format.html

The full schema is in the code tree under ref/xsd/. Most of the elements for text are in the wml.xsd file (wml stands for WordProcessing Markup Language).

You can find examples of other so-called "workaround functions" by searching on "python-docx" workaround function. Pay particular attention to the parse_xml() function and the OxmlElement objects which will allow you to create new XML subtrees and individual elements respectively. XML elements can be positioned using regular lxml._Element methods; all XML elements in python-docx are based on lxml. http://lxml.de/api/lxml.etree._Element-class.html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!