How to use python-docx to replace text in a Word document and save

后端 未结 7 1911
后悔当初
后悔当初 2020-11-29 19:25

The oodocx module mentioned in the same page refers the user to an /examples folder that does not seem to be there.
I have read the documentation of python-docx 0.7.2, p

相关标签:
7条回答
  • 2020-11-29 19:56

    The problem with your second attempt is that you haven't defined the parameters that savedocx needs. You need to do something like this before you save:

    relationships = docx.relationshiplist()
    title = "Document Title"
    subject = "Document Subject"
    creator = "Document Creator"
    keywords = []
    
    coreprops = docx.coreproperties(title=title, subject=subject, creator=creator,
                           keywords=keywords)
    app = docx.appproperties()
    content = docx.contenttypes()
    web = docx.websettings()
    word = docx.wordrelationships(relationships)
    output = r"path\to\where\you\want\to\save"
    
    0 讨论(0)
  • 2020-11-29 20:05

    The Office Dev Centre has an entry in which a developer has published (MIT licenced at this time) a description of a couple of algorithms that appear to suggest a solution for this (albeit in C#, and require porting):" MS Dev Centre posting

    0 讨论(0)
  • 2020-11-29 20:09

    The current version of python-docx does not have a search() function or a replace() function. These are requested fairly frequently, but an implementation for the general case is quite tricky and it hasn't risen to the top of the backlog yet.

    Several folks have had success though, getting done what they need, using the facilities already present. Here's an example. It has nothing to do with sections by the way :)

    for paragraph in document.paragraphs:
        if 'sea' in paragraph.text:
            print paragraph.text
            paragraph.text = 'new text containing ocean'
    

    To search in Tables as well, you would need to use something like:

    for table in document.tables:
        for cell in table.cells:
            for paragraph in cell.paragraphs:
                if 'sea' in paragraph.text:
                   ...
    

    If you pursue this path, you'll probably discover pretty quickly what the complexities are. If you replace the entire text of a paragraph, that will remove any character-level formatting, like a word or phrase in bold or italic.

    By the way, the code from @wnnmaw's answer is for the legacy version of python-docx and won't work at all with versions after 0.3.0.

    0 讨论(0)
  • 2020-11-29 20:13

    For the table case, I had to modify @scanny's answer to:

    for table in doc.tables:
        for col in table.columns:
            for cell in col.cells:
                for p in cell.paragraphs:
    

    to make it work. Indeed, this does not seem to work with the current state of the API:

    for table in document.tables:
        for cell in table.cells:
    

    Same problem with the code from here: https://github.com/python-openxml/python-docx/issues/30#issuecomment-38658149

    0 讨论(0)
  • 2020-11-29 20:14

    he changed the API in docx py again...

    for the sanity of everyone coming here:

    import datetime
    import os
    from decimal import Decimal
    from typing import NamedTuple
    
    from docx import Document
    from docx.document import Document as nDocument
    
    
    class DocxInvoiceArg(NamedTuple):
      invoice_to: str
      date_from: str
      date_to: str
      project_name: str
      quantity: float
      hourly: int
      currency: str
      bank_details: str
    
    
    class DocxService():
      tokens = [
        '@INVOICE_TO@',
        '@IDATE_FROM@',
        '@IDATE_TO@',
        '@INVOICE_NR@',
        '@PROJECTNAME@',
        '@QUANTITY@',
        '@HOURLY@',
        '@CURRENCY@',
        '@TOTAL@',
        '@BANK_DETAILS@',
      ]
    
      def __init__(self, replace_vals: DocxInvoiceArg):
        total = replace_vals.quantity * replace_vals.hourly
        invoice_nr = replace_vals.project_name + datetime.datetime.strptime(replace_vals.date_to, '%Y-%m-%d').strftime('%Y%m%d')
        self.replace_vals = [
          {'search': self.tokens[0], 'replace': replace_vals.invoice_to },
          {'search': self.tokens[1], 'replace': replace_vals.date_from },
          {'search': self.tokens[2], 'replace': replace_vals.date_to },
          {'search': self.tokens[3], 'replace': invoice_nr },
          {'search': self.tokens[4], 'replace': replace_vals.project_name },
          {'search': self.tokens[5], 'replace': replace_vals.quantity },
          {'search': self.tokens[6], 'replace': replace_vals.hourly },
          {'search': self.tokens[7], 'replace': replace_vals.currency },
          {'search': self.tokens[8], 'replace': total },
          {'search': self.tokens[9], 'replace': 'asdfasdfasdfdasf'},
        ]
        self.doc_path_template = os.path.dirname(os.path.realpath(__file__))+'/docs/'
        self.doc_path_output = self.doc_path_template + 'output/'
        self.document: nDocument = Document(self.doc_path_template + 'invoice_placeholder.docx')
    
    
      def save(self):
        for p in self.document.paragraphs:
          self._docx_replace_text(p)
        tables = self.document.tables
        self._loop_tables(tables)
        self.document.save(self.doc_path_output + 'testiboi3.docx')
    
      def _loop_tables(self, tables):
        for table in tables:
          for index, row in enumerate(table.rows):
            for cell in table.row_cells(index):
              if cell.tables:
                self._loop_tables(cell.tables)
              for p in cell.paragraphs:
                self._docx_replace_text(p)
    
            # for cells in column.
            # for cell in table.columns:
    
      def _docx_replace_text(self, p):
        print(p.text)
        for el in self.replace_vals:
          if (el['search'] in p.text):
            inline = p.runs
            # Loop added to work with runs (strings with same style)
            for i in range(len(inline)):
              print(inline[i].text)
              if el['search'] in inline[i].text:
                text = inline[i].text.replace(el['search'], str(el['replace']))
                inline[i].text = text
            print(p.text)
    

    Test case:

    from django.test import SimpleTestCase
    from docx.table import Table, _Rows
    
    from toggleapi.services.DocxService import DocxService, DocxInvoiceArg
    
    
    class TestDocxService(SimpleTestCase):
    
      def test_document_read(self):
        ds = DocxService(DocxInvoiceArg(invoice_to="""
        WAW test1
        Multi myfriend
        """,date_from="2019-08-01", date_to="2019-08-30", project_name='WAW', quantity=10.5, hourly=40, currency='USD',bank_details="""
        Paypal to:
        bippo@bippsi.com"""))
    
        ds.save()
    

    have folders docs and docs/output/ in same folder where you have DocxService.py

    e.g.

    be sure to parameterize and replace stuff

    0 讨论(0)
  • 2020-11-29 20:15

    I needed something to replace regular expressions in docx. I took scannys answer. To handle style I've used answer from: Python docx Replace string in paragraph while keeping style added recursive call to handle nested tables. and came up with something like this:

    import re
    from docx import Document
    
    def docx_replace_regex(doc_obj, regex , replace):
    
        for p in doc_obj.paragraphs:
            if regex.search(p.text):
                inline = p.runs
                # Loop added to work with runs (strings with same style)
                for i in range(len(inline)):
                    if regex.search(inline[i].text):
                        text = regex.sub(replace, inline[i].text)
                        inline[i].text = text
    
        for table in doc_obj.tables:
            for row in table.rows:
                for cell in row.cells:
                    docx_replace_regex(cell, regex , replace)
    
    
    
    regex1 = re.compile(r"your regex")
    replace1 = r"your replace string"
    filename = "test.docx"
    doc = Document(filename)
    docx_replace_regex(doc, regex1 , replace1)
    doc.save('result1.docx')
    

    To iterate over dictionary:

    for word, replacement in dictionary.items():
        word_re=re.compile(word)
        docx_replace_regex(doc, word_re , replacement)
    

    Note that this solution will replace regex only if whole regex has same style in document.

    Also if text is edited after saving same style text might be in separate runs. For example if you open document that has "testabcd" string and you change it to "test1abcd" and save, even dough its the same style there are 3 separate runs "test", "1", and "abcd", in this case replacement of test1 won't work.

    This is for tracking changes in the document. To marge it to one run, in Word you need to go to "Options", "Trust Center" and in "Privacy Options" unthick "Store random numbers to improve combine accuracy" and save the document.

    0 讨论(0)
提交回复
热议问题