combine word document using python docx

前端 未结 6 1856
陌清茗
陌清茗 2020-12-06 05:54

I have few word files that each have specific content. I would like for a snippet that show me or help me to figure out how to combine the word files into one file, while us

相关标签:
6条回答
  • 2020-12-06 06:28

    If your needs are simple, something like this might work:

    source_document = Document('source.docx')
    target_document = Document()
    
    for paragraph in source_document.paragraphs:
        text = paragraph.text
        target_document.add_paragraph(text)
    

    There are additional things you can do, but that should get you started.

    It turns out that copying content from one Word file to another is quite complex in the general case, involving things like reconciling styles present in the source document that may be conflicting in the target document for example. So it's not a feature we're likely to be adding in the next year, say.

    0 讨论(0)
  • 2020-12-06 06:30

    Create an empty document (empty.docx) and add your two documents to this. On each loop of the iteration over the files, add a page break if necessary.

    On completion save the new file that contains your two combined files.

    from docx import Document
    
    files = ['file1.docx', 'file2.docx']
    
    def combine_word_documents(files):
        combined_document = Document('empty.docx')
        count, number_of_files = 0, len(files)
        for file in files:
            sub_doc = Document(file)
    
            # Don't add a page break if you've
            # reached the last file.
            if count < number_of_files - 1:
                sub_doc.add_page_break()
    
            for element in sub_doc._document_part.body._element:
                combined_document._document_part.body._element.append(element)
            count += 1
    
        combined_document.save('combined_word_documents.docx')
    
    combine_word_documents(files)
    
    0 讨论(0)
  • 2020-12-06 06:41

    I've adjusted the example above to work with the latest version of python-docx (0.8.6 at the time of writing). Note that this just copies the elements (merging styles of elements is more complicated to do):

    from docx import Document
    
    files = ['file1.docx', 'file2.docx']
    
    def combine_word_documents(files):
        merged_document = Document()
    
        for index, file in enumerate(files):
            sub_doc = Document(file)
    
            # Don't add a page break if you've reached the last file.
            if index < len(files)-1:
               sub_doc.add_page_break()
    
            for element in sub_doc.element.body:
                merged_document.element.body.append(element)
    
        merged_document.save('merged.docx')
    
    combine_word_documents(files)
    
    0 讨论(0)
  • 2020-12-06 06:45

    The alternative approach to merge two documents including all the styles is to use python library docxcompose ( https://pypi.org/project/docxcompose/) . We do not need to explicitly define the styling and we do not have to read the document paragraph by paragraph and append it to the master document. The usage of the python docxcompose is shown in the below code

    #Importing the required packages
    
    from docxcompose.composer import Composer
    from docx import Document as Document_compose
    #filename_master is name of the file you want to merge the docx file into
    master = Document_compose(filename_master)
    
    composer = Composer(master)
    #filename_second_docx is the name of the second docx file
    doc2 = Document_compose(filename_second_docx)
    #append the doc2 into the master using composer.append function
    composer.append(doc2)
    #Save the combined docx with a name
    composer.save("combined.docx")
    

    If you want to merge multiple documents into one docx file you can use the below function

    
    #Filename_master is the name of the file you want to merge all the document into
    #files_list is a list containing all the filename of the docx file to be merged
    def combine_all_docx(filename_master,files_list):
        number_of_sections=len(files_list)
        master = Document_compose(filename_master)
        composer = Composer(master)
        for i in range(0, number_of_sections):
            doc_temp = Document_compose(files_list[i])
            composer.append(doc_temp)
        composer.save("combined_file.docx")
    #For Example
    #filename_master="file1.docx"
    #files_list=["file2.docx","file3.docx","file4.docx",file5.docx"]
    #Calling the function
    #combine_all_docx(filename_master,files_list)
    #This function will combine all the document in the array files_list into the file1.docx and save the merged document into combined_file.docx
    
    0 讨论(0)
  • 2020-12-06 06:49

    This is all very useful. I combined the answers of Martijn Jacobs and Mr Kriss.

    def combine_word_documents(input_files):
        """
        :param input_files: an iterable with full paths to docs
        :return: a Document object with the merged files
        """
        for filnr, file in enumerate(input_files):
            # in my case the docx templates are in a FileField of Django, add the MEDIA_ROOT, discard the next 2 lines if not appropriate for you. 
            if 'offerte_template' in file:
                file = os.path.join(settings.MEDIA_ROOT, file)
    
            if filnr == 0:
                merged_document = Document(file)
                merged_document.add_page_break()
    
            else:
                sub_doc = Document(file)
    
                # Don't add a page break if you've reached the last file.
                if filnr < len(input_files)-1:
                    sub_doc.add_page_break()
    
                for element in sub_doc.element.body:
                    merged_document.element.body.append(element)
    
        return merged_document
    
    0 讨论(0)
  • 2020-12-06 06:50

    If you just need to combine simple documents with text, you can use python-docx as mentioned above.

    If you need to merge documents containing hyperlinks, images, lists, bullet points etc. You can done this by using lxml to combining the document body and all the reference files, like:

    • word/styles.xml
    • word/numbering.xml
    • word/media
    • [Content_Types].xml

    etc.

    0 讨论(0)
提交回复
热议问题