Merge multiple word documents into one Open Xml

前端 未结 4 881
余生分开走
余生分开走 2020-11-30 01:57

I have around 10 word documents which I generate using open xml and other stuff. Now I would like to create another word document and one by one I would like to join them in

相关标签:
4条回答
  • 2020-11-30 02:34

    The only thing missing in these answers is the for loop.

    For those who just want to copy / paste it:

    void MergeInNewFile(string resultFile, IList<string> filenames)
    {
        using (WordprocessingDocument document = WordprocessingDocument.Create(resultFile, WordprocessingDocumentType.Document))
        {
            MainDocumentPart mainPart = document.AddMainDocumentPart();
            mainPart.Document = new Document(new Body());
    
            foreach (string filename in filenames)
            {
                AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML);
                string altChunkId = mainPart.GetIdOfPart(chunk);
    
                using (FileStream fileStream = File.Open(filename, FileMode.Open))
                {
                    chunk.FeedData(fileStream);
                }
    
                AltChunk altChunk = new AltChunk { Id = altChunkId };
                mainPart.Document.Body.AppendChild(altChunk);
            }
    
            mainPart.Document.Save();
        }
    }
    

    All credits go to Chris and yonexbat

    0 讨论(0)
  • 2020-11-30 02:36

    There is a nice wrapper API (Document Builder 2.2) around open xml specially designed to merge documents, with flexibility of choosing the paragraphs to merge etc. You can download it from here (update: moved to github).

    The documentation and screen casts on how to use it are here.

    Update: Code Sample

     var sources = new List<Source>();
     //Document Streams (File Streams) of the documents to be merged.
     foreach (var stream in documentstreams)
     {
            var tempms = new MemoryStream();
            stream.CopyTo(tempms);
            sources.Add(new Source(new WmlDocument(stream.Length.ToString(), tempms), true));
     }
    
      var mergedDoc = DocumentBuilder.BuildDocument(sources);
      mergedDoc.SaveAs(@"C:\TargetFilePath");
    

    Types Source and WmlDocument are from Document Builder API.

    You can even add the file paths directly if you choose to as:

    sources.Add(new Source(new WmlDocument(@"C:\FileToBeMerged1.docx"));
    sources.Add(new Source(new WmlDocument(@"C:\FileToBeMerged2.docx"));
    

    Found this Nice Comparison between AltChunk and Document Builder approaches to merge documents - helpful to choose based on ones requirements.

    You can also use DocX library to merge documents but I prefer Document Builder over this for merging documents.

    Hope this helps.

    0 讨论(0)
  • 2020-11-30 02:49

    Using openXML SDK only, you can use AltChunk element to merge the multiple document into one.

    This link the-easy-way-to-assemble-multiple-word-documents and this one How to Use altChunk for Document Assembly provide some samples.

    EDIT 1

    Based on your code that uses altchunk in the updated question (update#1), here is the VB.Net code I have tested and that works like a charm for me:

    Using myDoc = DocumentFormat.OpenXml.Packaging.WordprocessingDocument.Open("D:\\Test.docx", True)
            Dim altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 2)
            Dim mainPart = myDoc.MainDocumentPart
            Dim chunk = mainPart.AddAlternativeFormatImportPart(
                DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML, altChunkId)
            Using fileStream As IO.FileStream = IO.File.Open("D:\\Test1.docx", IO.FileMode.Open)
                chunk.FeedData(fileStream)
            End Using
            Dim altChunk = New DocumentFormat.OpenXml.Wordprocessing.AltChunk()
            altChunk.Id = altChunkId
            mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements(Of DocumentFormat.OpenXml.Wordprocessing.Paragraph).Last())
            mainPart.Document.Save()
    End Using
    

    EDIT 2

    The second issue (update#2)

    This code is appending the Test2 data twice, in place of Test1 data as well.

    is related to altchunkid.

    For each document you want to merge in the main document, you need to:

    1. add an AlternativeFormatImportPart in the mainDocumentPart with an Id which must to be unique. This element contains the Inserted data
    2. add in the body an Altchunk element in which you set the id to reference the previous AlternativeFormatImportPart.

    In your code, you are using the same Id for all the AltChunks. It's why you see many time the same text.

    I am not sure the altchunkid will be unique with your code: string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 2);

    If you don't need to set a specific value, I recommend you to not set explicitly the AltChunkId when you add the AlternativeFormatImportPart. Instead, you get one generated by the SDK like this:

    VB.Net

    Dim chunk As AlternativeFormatImportPart = mainPart.AddAlternativeFormatImportPart(DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML)
    Dim altchunkid As String = mainPart.GetIdOfPart(chunk)
    

    C#

    AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML);
    string altchunkid = mainPart.GetIdOfPart(chunk);
    
    0 讨论(0)
  • 2020-11-30 02:51

    Easy to use in C#:

    using System;
    using System.IO;
    using System.Linq;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml.Wordprocessing;
    
    namespace WordMergeProject
    {
        public class Program
        {
            private static void Main(string[] args)
            {
                byte[] word1 = File.ReadAllBytes(@"..\..\word1.docx");
                byte[] word2 = File.ReadAllBytes(@"..\..\word2.docx");
    
                byte[] result = Merge(word1, word2);
    
                File.WriteAllBytes(@"..\..\word3.docx", result);
            }
    
            private static byte[] Merge(byte[] dest, byte[] src)
            {
                string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString();
    
                var memoryStreamDest = new MemoryStream();
                memoryStreamDest.Write(dest, 0, dest.Length);
                memoryStreamDest.Seek(0, SeekOrigin.Begin);
                var memoryStreamSrc = new MemoryStream(src);
    
                using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStreamDest, true))
                {
                    MainDocumentPart mainPart = doc.MainDocumentPart;
                    AlternativeFormatImportPart altPart =
                        mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);
                    altPart.FeedData(memoryStreamSrc);
                    var altChunk = new AltChunk();
                    altChunk.Id = altChunkId;
                                  OpenXmlElement lastElem = mainPart.Document.Body.Elements<AltChunk>().LastOrDefault();
                if(lastElem == null)
                {
                    lastElem = mainPart.Document.Body.Elements<Paragraph>().Last();
                }
    
    
                //Page Brake einfügen
                Paragraph pageBreakP = new Paragraph();
                Run pageBreakR = new Run();
                Break pageBreakBr = new Break() { Type = BreakValues.Page };
    
                pageBreakP.Append(pageBreakR);
                pageBreakR.Append(pageBreakBr);                
    
                return memoryStreamDest.ToArray();
            }
        }
    }
    
    0 讨论(0)
提交回复
热议问题