Text-Replace in docx and save the changed file with python-docx

前端 未结 8 747
小鲜肉
小鲜肉 2021-01-02 07:04

I\'m trying to use the python-docx module to replace a word in a file and save the new file with the caveat that the new file must have exactly the same formatting as the ol

相关标签:
8条回答
  • 2021-01-02 07:36

    I've forked a repo of python-docx here, which preserves all of the preexisting data in a docx file, including formatting. Hopefully this is what you're looking for.

    0 讨论(0)
  • 2021-01-02 07:37

    In addition to @ramil, you have to escape some characters before placing them as string values into the XML, so this worked for me:

    def escape(escapee):
      escapee = escapee.replace("&", "&")
      escapee = escapee.replace("<", "&lt;")
      escapee = escapee.replace(">", "&gt;")
      escapee = escapee.replace("\"", "&quot;")
      escapee = escapee.replace("'", "&apos;")
    return escapee
    
    0 讨论(0)
  • 2021-01-02 07:42
    from docx import Document
    file_path = 'C:/tmp.docx'
    document = Document(file_path)
    
    def docx_replace(doc_obj, data: dict):
        """example: data=dict(order_id=123), result: {order_id} -> 123"""
        for paragraph in doc_obj.paragraphs:
            for key, val in data.items():
                key_name = '{{{}}}'.format(key)
                if key_name in paragraph.text:
                    paragraph.text = paragraph.text.replace(key_name, str(val))
        for table in doc_obj.tables:
            for row in table.rows:
                for cell in row.cells:
                    docx_replace(cell, data)
    
    docx_replace(document, dict(order_id=123, year=2018, payer_fio='payer_fio', payer_fio1='payer_fio1'))
    document.save(file_path)
    
    0 讨论(0)
  • 2021-01-02 07:43

    The problem with the methods above is that they lose the existing formatting. Please see my answer which performs the replace and retains formatting.

    There is also python-docx-template which allows jinja2 style templating within a docx template. Here's a link to the documentation

    0 讨论(0)
  • 2021-01-02 07:48

    this worked for me:

    def docx_replace(old_file,new_file,rep):
        zin = zipfile.ZipFile (old_file, 'r')
        zout = zipfile.ZipFile (new_file, 'w')
        for item in zin.infolist():
            buffer = zin.read(item.filename)
            if (item.filename == 'word/document.xml'):
                res = buffer.decode("utf-8")
                for r in rep:
                    res = res.replace(r,rep[r])
                buffer = res.encode("utf-8")
            zout.writestr(item, buffer)
        zout.close()
        zin.close()
    
    0 讨论(0)
  • 2021-01-02 07:54

    As it seems to be, Docx for Python is not meant to store a full Docx with images, headers, ... , but only contains the inner content of the document. So there's no simple way to do this.

    Howewer, here is how you could do it:

    First, have a look at the docx tag wiki:

    It explains how the docx file can be unzipped: Here's how a typical file looks like:

    +--docProps
    |  +  app.xml
    |  \  core.xml
    +  res.log
    +--word //this folder contains most of the files that control the content of the document
    |  +  document.xml //Is the actual content of the document
    |  +  endnotes.xml
    |  +  fontTable.xml
    |  +  footer1.xml //Containst the elements in the footer of the document
    |  +  footnotes.xml
    |  +--media //This folder contains all images embedded in the word
    |  |  \  image1.jpeg
    |  +  settings.xml
    |  +  styles.xml
    |  +  stylesWithEffects.xml
    |  +--theme
    |  |  \  theme1.xml
    |  +  webSettings.xml
    |  \--_rels
    |     \  document.xml.rels //this document tells word where the images are situated
    +  [Content_Types].xml
    \--_rels
       \  .rels
    

    Docx only gets one part of the document, in the method opendocx

    def opendocx(file):
        '''Open a docx file, return a document XML tree'''
        mydoc = zipfile.ZipFile(file)
        xmlcontent = mydoc.read('word/document.xml')
        document = etree.fromstring(xmlcontent)
        return document
    

    It only gets the document.xml file.

    What I recommend you to do is:

    1. get the content of the document with **opendocx*
    2. Replace the document.xml with the advReplace method
    3. Open the docx as a zip, and replace the document.xml content's by the new xml content.
    4. Close and output the zipped file (renaming it to output.docx)

    If you have node.js installed, be informed that I have worked on DocxGenJS which is templating engine for docx documents, the library is in active development and will be released soon as a node module.

    0 讨论(0)
提交回复
热议问题