Generating docx file from HTML file using OpenXML

后端 未结 3 386
别那么骄傲
别那么骄傲 2020-12-18 06:24

I\'m using this method for generating docx file:

public static void CreateDocument(string documentFileName, string text)
{
    using (Wordproces         


        
相关标签:
3条回答
  • 2020-12-18 06:44

    I realize I'm 7 years late to the game here. Still, for future people searching on how to convert from HTML to Word Doc, this blog posting on a Microsoft MSDN site gives most of the ingredients necessary to do this using OpenXML. I found the post itself to be confusing, but the source code that he included clarified it all for me.

    The only piece that was missing was how to build a Docx file from scratch, instead of how to merge into an existing one as his example shows. I found that tidbit from here.

    Unfortunately the project I used this in is written in vb.net. So I'm going to share the vb.net code first, then an automated c# conversion of it, that may or may not be accurate.

    vb.net code:

    Imports DocumentFormat.OpenXml
    Imports DocumentFormat.OpenXml.Packaging
    Imports DocumentFormat.OpenXml.Wordprocessing
    Imports System.IO
    
    Dim ms As IO.MemoryStream
    Dim mainPart As MainDocumentPart
    Dim b As Body
    Dim d As Document
    Dim chunk As AlternativeFormatImportPart
    Dim altChunk As AltChunk
    
    Const altChunkID As String = "AltChunkId1"
    
    ms = New MemoryStream()
    
    Using myDoc = WordprocessingDocument.Create(ms,WordprocessingDocumentType.Document)
        mainPart = myDoc.MainDocumentPart
    
        If mainPart Is Nothing Then
            mainPart = myDoc.AddMainDocumentPart()
    
            b = New Body()
            d = New Document(b)
            d.Save(mainPart)
        End If
    
        chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Xhtml, altChunkID)
    
        Using chunkStream As Stream = chunk.GetStream(FileMode.Create, FileAccess.Write)
            Using stringStream As StreamWriter = New StreamWriter(chunkStream)
                stringStream.Write("YOUR HTML HERE")
            End Using
        End Using
    
        altChunk = New AltChunk()
        altChunk.Id = altChunkID
        mainPart.Document.Body.InsertAt(Of AltChunk)(altChunk, 0)
        mainPart.Document.Save()
    End Using
    

    c# code:

    using DocumentFormat.OpenXml;
    using DocumentFormat.OpenXml.Packaging;
    using DocumentFormat.OpenXml.Wordprocessing;
    using System.IO;
    
    IO.MemoryStream ms;
    MainDocumentPart mainPart;
    Body b;
    Document d;
    AlternativeFormatImportPart chunk;
    AltChunk altChunk;
    
    string altChunkID = "AltChunkId1";
    
    ms = new MemoryStream();
    
    Using (myDoc = WordprocessingDocument.Create(ms, WordprocessingDocumentType.Document))
    {
        mainPart = myDoc.MainDocumentPart;
    
        if (mainPart == null) 
        {
             mainPart = myDoc.AddMainDocumentPart();
             b = new Body();
             d = new Document(b);
             d.Save(mainPart);
        }
    
        chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Xhtml, altChunkID);
    
        Using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write)
        {
             Using (StreamWriter stringStream = new StreamWriter(chunkStream))         
             {
                  stringStream.Write("YOUR HTML HERE");
             }
        }    
    
        altChunk = new AltChunk();
        altChunk.Id = altChunkID;
        mainPart.Document.Body.InsertAt(Of, AltChunk)[altChunk, 0];
        mainPart.Document.Save();
    }
    

    Note that I'm using the ms memory stream in another routine, which is where it's disposed of after use.

    I hope this helps someone else!

    0 讨论(0)
  • 2020-12-18 06:48

    You cannot just insert the HTML content into a "document.xml", this part expects only a WordprocessingML content so you'll have to convert that HTML into WordprocessingML, see this.

    Another thing that you could use is altChunk element, with it you would be able to place a HTML file inside your DOCX file and then reference that HTML content on some specific place inside your document, see this.

    Last as an alternative, with GemBox.Document library you could accomplish exactly what you want, see the following:

    public static void CreateDocument(string documentFileName, string text)
    {
        DocumentModel document = new DocumentModel();
        document.Content.LoadText(text, LoadOptions.HtmlDefault);
        document.Save(documentFileName);
    }
    

    Or you could actually straightforwardly convert a HTML content into a DOCX file:

    public static void Convert(string documentFileName, string htmlText)
    {
        HtmlLoadOptions options = LoadOptions.HtmlDefault;
        using (var htmlStream = new MemoryStream(options.Encoding.GetBytes(htmlText)))
            DocumentModel.Load(htmlStream, options)
                         .Save(documentFileName);
    }
    
    0 讨论(0)
  • 2020-12-18 06:50

    I could successfully convert HTML content to docx file using OpenXML in an .net Core using this code

    string html = "<strong>Hello</strong> World";
    using (MemoryStream generatedDocument = new MemoryStream()){
       using (WordprocessingDocument package = 
                      WordprocessingDocument.Create(generatedDocument,
                      WordprocessingDocumentType.Document)){
       MainDocumentPart mainPart = package.MainDocumentPart;
       if (mainPart == null){
        mainPart = package.AddMainDocumentPart();
        new Document(new Body()).Save(mainPart);
    }
    HtmlConverter converter = new HtmlConverter(mainPart);
    converter.ParseHtml(html);
    mainPart.Document.Save();
    }
    

    To save on disk

    System.IO.File.WriteAllBytes("filename.docx", generatedDocument.ToArray());
    

    To return the file for download in net core mvc, use

    return File(generatedDocument.ToArray(), 
              "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
              "filename.docx");
    
    0 讨论(0)
提交回复
热议问题