Text-Replace in docx and save the changed file with python-docx

≯℡__Kan透↙ 提交于 2019-11-27 18:02:11

问题


I'm trying to use the python-docx module to replace a word in a file and save the new file with the caveat that the new file must have exactly the same formatting as the old file, but with the word replaced. How am I supposed to do this?

The docx module has a savedocx that takes 7 inputs:

  • document
  • coreprops
  • appprops
  • contenttypes
  • websettings
  • wordrelationships
  • output

How do I keep everything in my original file the same except for the replaced word?


回答1:


As it seems to be, Docx for Python is not meant to store a full Docx with images, headers, ... , but only contains the inner content of the document. So there's no simple way to do this.

Howewer, here is how you could do it:

First, have a look at the docx tag wiki:

It explains how the docx file can be unzipped: Here's how a typical file looks like:

+--docProps
|  +  app.xml
|  \  core.xml
+  res.log
+--word //this folder contains most of the files that control the content of the document
|  +  document.xml //Is the actual content of the document
|  +  endnotes.xml
|  +  fontTable.xml
|  +  footer1.xml //Containst the elements in the footer of the document
|  +  footnotes.xml
|  +--media //This folder contains all images embedded in the word
|  |  \  image1.jpeg
|  +  settings.xml
|  +  styles.xml
|  +  stylesWithEffects.xml
|  +--theme
|  |  \  theme1.xml
|  +  webSettings.xml
|  \--_rels
|     \  document.xml.rels //this document tells word where the images are situated
+  [Content_Types].xml
\--_rels
   \  .rels

Docx only gets one part of the document, in the method opendocx

def opendocx(file):
    '''Open a docx file, return a document XML tree'''
    mydoc = zipfile.ZipFile(file)
    xmlcontent = mydoc.read('word/document.xml')
    document = etree.fromstring(xmlcontent)
    return document

It only gets the document.xml file.

What I recommend you to do is:

  1. get the content of the document with **opendocx*
  2. Replace the document.xml with the advReplace method
  3. Open the docx as a zip, and replace the document.xml content's by the new xml content.
  4. Close and output the zipped file (renaming it to output.docx)

If you have node.js installed, be informed that I have worked on DocxGenJS which is templating engine for docx documents, the library is in active development and will be released soon as a node module.




回答2:


this worked for me:

def docx_replace(old_file,new_file,rep):
    zin = zipfile.ZipFile (old_file, 'r')
    zout = zipfile.ZipFile (new_file, 'w')
    for item in zin.infolist():
        buffer = zin.read(item.filename)
        if (item.filename == 'word/document.xml'):
            res = buffer.decode("utf-8")
            for r in rep:
                res = res.replace(r,rep[r])
            buffer = res.encode("utf-8")
        zout.writestr(item, buffer)
    zout.close()
    zin.close()



回答3:


Are you using the docx module from here?

If yes, then the docx module already exposes methods like replace, advReplace etc which can help you achieve your task. Refer to the source code for more details of the exposed methods.




回答4:


I've forked a repo of python-docx here, which preserves all of the preexisting data in a docx file, including formatting. Hopefully this is what you're looking for.




回答5:


In addition to @ramil, you have to escape some characters before placing them as string values into the XML, so this worked for me:

def escape(escapee):
  escapee = escapee.replace("&", "&")
  escapee = escapee.replace("<", "&lt;")
  escapee = escapee.replace(">", "&gt;")
  escapee = escapee.replace("\"", "&quot;")
  escapee = escapee.replace("'", "&apos;")
return escapee



回答6:


from docx import Document
file_path = 'C:/tmp.docx'
document = Document(file_path)

def docx_replace(doc_obj, data: dict):
    """example: data=dict(order_id=123), result: {order_id} -> 123"""
    for paragraph in doc_obj.paragraphs:
        for key, val in data.items():
            key_name = '{{{}}}'.format(key)
            if key_name in paragraph.text:
                paragraph.text = paragraph.text.replace(key_name, str(val))
    for table in doc_obj.tables:
        for row in table.rows:
            for cell in row.cells:
                docx_replace(cell, data)

docx_replace(document, dict(order_id=123, year=2018, payer_fio='payer_fio', payer_fio1='payer_fio1'))
document.save(file_path)



回答7:


The problem with the methods above is that they lose the existing formatting. Please see my answer which performs the replace and retains formatting.

There is also python-docx-template which allows jinja2 style templating within a docx template. Here's a link to the documentation



来源:https://stackoverflow.com/questions/17850227/text-replace-in-docx-and-save-the-changed-file-with-python-docx

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!