Python docx Replace string in paragraph while keeping style

前端 未结 3 1489
终归单人心
终归单人心 2020-12-01 07:16

I need help replacing a string in a word document while keeping the formatting of the entire document.

I\'m using python-docx, after reading the documentation, it wo

相关标签:
3条回答
  • 2020-12-01 07:39

    I posted this question (even though I saw a few identical ones on here), because none of those (to my knowledge) solved the issue. There was one using a oodocx library, which I tried, but did not work. So I found a workaround.

    The code is very similar, but the logic is: when I find the paragraph that contains the string I wish to replace, add another loop using runs. (this will only work if the string I wish to replace has the same formatting).

    def replace_string(filename):
        doc = Document(filename)
        for p in doc.paragraphs:
            if 'old text' in p.text:
                inline = p.runs
                # Loop added to work with runs (strings with same style)
                for i in range(len(inline)):
                    if 'old text' in inline[i].text:
                        text = inline[i].text.replace('old text', 'new text')
                        inline[i].text = text
                print p.text
    
        doc.save('dest1.docx')
        return 1
    
    0 讨论(0)
  • 2020-12-01 07:43
    from docx import Document
    
    document = Document('old.docx')
    
    dic = {'name':'ahmed','me':'zain'}
    for p in document.paragraphs:
        inline = p.runs
        for i in range(len(inline)):
            text = inline[i].text
            if text in dic.keys():
                text=text.replace(text,dic[text])
                inline[i].text = text
    
    document.save('new.docx')
    
    0 讨论(0)
  • 2020-12-01 07:46

    This is what works for me to retain the text style when replacing text.

    Based on Alo's answer and the fact the search text can be split over several runs, here's what worked for me to replace placeholder text in a template docx file. It checks all the document paragraphs and any table cell contents for the placeholders.

    Once the search text is found in a paragraph it loops through it's runs identifying which runs contains the partial text of the search text, after which it inserts the replacement text in the first run then blanks out the remaining search text characters in the remaining runs.

    I hope this helps someone. Here's the gist if anyone wants to improve it

    Edit: I have subsequently discovered python-docx-template which allows jinja2 style templating within a docx template. Here's a link to the documentation

    python3 python-docx python-docx-template

    def docx_replace(doc, data):
        paragraphs = list(doc.paragraphs)
        for t in doc.tables:
            for row in t.rows:
                for cell in row.cells:
                    for paragraph in cell.paragraphs:
                        paragraphs.append(paragraph)
        for p in paragraphs:
            for key, val in data.items():
                key_name = '${{{}}}'.format(key) # I'm using placeholders in the form ${PlaceholderName}
                if key_name in p.text:
                    inline = p.runs
                    # Replace strings and retain the same style.
                    # The text to be replaced can be split over several runs so
                    # search through, identify which runs need to have text replaced
                    # then replace the text in those identified
                    started = False
                    key_index = 0
                    # found_runs is a list of (inline index, index of match, length of match)
                    found_runs = list()
                    found_all = False
                    replace_done = False
                    for i in range(len(inline)):
    
                        # case 1: found in single run so short circuit the replace
                        if key_name in inline[i].text and not started:
                            found_runs.append((i, inline[i].text.find(key_name), len(key_name)))
                            text = inline[i].text.replace(key_name, str(val))
                            inline[i].text = text
                            replace_done = True
                            found_all = True
                            break
    
                        if key_name[key_index] not in inline[i].text and not started:
                            # keep looking ...
                            continue
    
                        # case 2: search for partial text, find first run
                        if key_name[key_index] in inline[i].text and inline[i].text[-1] in key_name and not started:
                            # check sequence
                            start_index = inline[i].text.find(key_name[key_index])
                            check_length = len(inline[i].text)
                            for text_index in range(start_index, check_length):
                                if inline[i].text[text_index] != key_name[key_index]:
                                    # no match so must be false positive
                                    break
                            if key_index == 0:
                                started = True
                            chars_found = check_length - start_index
                            key_index += chars_found
                            found_runs.append((i, start_index, chars_found))
                            if key_index != len(key_name):
                                continue
                            else:
                                # found all chars in key_name
                                found_all = True
                                break
    
                        # case 2: search for partial text, find subsequent run
                        if key_name[key_index] in inline[i].text and started and not found_all:
                            # check sequence
                            chars_found = 0
                            check_length = len(inline[i].text)
                            for text_index in range(0, check_length):
                                if inline[i].text[text_index] == key_name[key_index]:
                                    key_index += 1
                                    chars_found += 1
                                else:
                                    break
                            # no match so must be end
                            found_runs.append((i, 0, chars_found))
                            if key_index == len(key_name):
                                found_all = True
                                break
    
                    if found_all and not replace_done:
                        for i, item in enumerate(found_runs):
                            index, start, length = [t for t in item]
                            if i == 0:
                                text = inline[index].text.replace(inline[index].text[start:start + length], str(val))
                                inline[index].text = text
                            else:
                                text = inline[index].text.replace(inline[index].text[start:start + length], '')
                                inline[index].text = text
                    # print(p.text)
    
    # usage
    
    doc = docx.Document('path/to/template.docx')
    docx_replace(doc, dict(ItemOne='replacement text', ItemTwo="Some replacement text\nand some more")
    doc.save('path/to/destination.docx')
    
    0 讨论(0)
提交回复
热议问题