I\'m using this method for generating docx
file:
public static void CreateDocument(string documentFileName, string text)
{
using (Wordproces
I realize I'm 7 years late to the game here. Still, for future people searching on how to convert from HTML to Word Doc, this blog posting on a Microsoft MSDN site gives most of the ingredients necessary to do this using OpenXML. I found the post itself to be confusing, but the source code that he included clarified it all for me.
The only piece that was missing was how to build a Docx file from scratch, instead of how to merge into an existing one as his example shows. I found that tidbit from here.
Unfortunately the project I used this in is written in vb.net. So I'm going to share the vb.net code first, then an automated c# conversion of it, that may or may not be accurate.
vb.net code:
Imports DocumentFormat.OpenXml
Imports DocumentFormat.OpenXml.Packaging
Imports DocumentFormat.OpenXml.Wordprocessing
Imports System.IO
Dim ms As IO.MemoryStream
Dim mainPart As MainDocumentPart
Dim b As Body
Dim d As Document
Dim chunk As AlternativeFormatImportPart
Dim altChunk As AltChunk
Const altChunkID As String = "AltChunkId1"
ms = New MemoryStream()
Using myDoc = WordprocessingDocument.Create(ms,WordprocessingDocumentType.Document)
mainPart = myDoc.MainDocumentPart
If mainPart Is Nothing Then
mainPart = myDoc.AddMainDocumentPart()
b = New Body()
d = New Document(b)
d.Save(mainPart)
End If
chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Xhtml, altChunkID)
Using chunkStream As Stream = chunk.GetStream(FileMode.Create, FileAccess.Write)
Using stringStream As StreamWriter = New StreamWriter(chunkStream)
stringStream.Write("YOUR HTML HERE")
End Using
End Using
altChunk = New AltChunk()
altChunk.Id = altChunkID
mainPart.Document.Body.InsertAt(Of AltChunk)(altChunk, 0)
mainPart.Document.Save()
End Using
c# code:
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using System.IO;
IO.MemoryStream ms;
MainDocumentPart mainPart;
Body b;
Document d;
AlternativeFormatImportPart chunk;
AltChunk altChunk;
string altChunkID = "AltChunkId1";
ms = new MemoryStream();
Using (myDoc = WordprocessingDocument.Create(ms, WordprocessingDocumentType.Document))
{
mainPart = myDoc.MainDocumentPart;
if (mainPart == null)
{
mainPart = myDoc.AddMainDocumentPart();
b = new Body();
d = new Document(b);
d.Save(mainPart);
}
chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Xhtml, altChunkID);
Using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write)
{
Using (StreamWriter stringStream = new StreamWriter(chunkStream))
{
stringStream.Write("YOUR HTML HERE");
}
}
altChunk = new AltChunk();
altChunk.Id = altChunkID;
mainPart.Document.Body.InsertAt(Of, AltChunk)[altChunk, 0];
mainPart.Document.Save();
}
Note that I'm using the ms
memory stream in another routine, which is where it's disposed of after use.
I hope this helps someone else!
You cannot just insert the HTML content into a "document.xml", this part expects only a WordprocessingML content so you'll have to convert that HTML into WordprocessingML, see this.
Another thing that you could use is altChunk element, with it you would be able to place a HTML file inside your DOCX file and then reference that HTML content on some specific place inside your document, see this.
Last as an alternative, with GemBox.Document library you could accomplish exactly what you want, see the following:
public static void CreateDocument(string documentFileName, string text)
{
DocumentModel document = new DocumentModel();
document.Content.LoadText(text, LoadOptions.HtmlDefault);
document.Save(documentFileName);
}
Or you could actually straightforwardly convert a HTML content into a DOCX file:
public static void Convert(string documentFileName, string htmlText)
{
HtmlLoadOptions options = LoadOptions.HtmlDefault;
using (var htmlStream = new MemoryStream(options.Encoding.GetBytes(htmlText)))
DocumentModel.Load(htmlStream, options)
.Save(documentFileName);
}
I could successfully convert HTML content to docx file using OpenXML in an .net Core using this code
string html = "<strong>Hello</strong> World";
using (MemoryStream generatedDocument = new MemoryStream()){
using (WordprocessingDocument package =
WordprocessingDocument.Create(generatedDocument,
WordprocessingDocumentType.Document)){
MainDocumentPart mainPart = package.MainDocumentPart;
if (mainPart == null){
mainPart = package.AddMainDocumentPart();
new Document(new Body()).Save(mainPart);
}
HtmlConverter converter = new HtmlConverter(mainPart);
converter.ParseHtml(html);
mainPart.Document.Save();
}
To save on disk
System.IO.File.WriteAllBytes("filename.docx", generatedDocument.ToArray());
To return the file for download in net core mvc, use
return File(generatedDocument.ToArray(),
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"filename.docx");