I am using Word and OpenXml to provide mail merge functionality in a C# ASP.NET web application:
1) A document is uploaded with a number of pre-defined strings for s
I second the use of Content Controls recommendation. Using them to mark up the areas of your document where you want to perform substitution is by far the easiest way to do it.
As for duplicating the document (and retaining the entire document contents, styles and all) it's relatively easy:
string documentURL = "full URL to your document";
byte[] docAsArray = File.ReadAllBytes(documentURL);
using (MemoryStream stream = new MemoryStream)
{
stream.Write(docAsArray, 0, docAsArray.Length); // THIS performs doc copy
using (WordprocessingDocument doc = WordprocessingDocument.Open(stream, true))
{
// perform content control substitution here, making sure to call .Save()
// on any documents Part's changed.
}
File.WriteAllBytes("full URL of your new doc to save, including .docx", stream.ToArray());
}
Actually finding the content controls is a piece of cake using LINQ. The following example finds all the Simple Text content controls (which are typed as SdtRun):
using (WordprocessingDocument doc = WordprocessingDocument.Open(stream, true))
{
var mainDocument = doc.MainDocumentPart.Document;
var contentControls = from sdt in mainDocument.Descendants<SdtRun>() select sdt;
foreach (var cc in contentControls)
{
// drill down through the containment hierarchy to get to
// the contained <Text> object
cc.SdtContentRun.GetFirstChild<Run>().GetFirstChild<Text>().Text = "my replacement string";
}
}
The <Run>
and <Text>
elements may not already exist but creating them is a simple as:
cc.SdtContentRun.Append(new Run(new Text("my replacement string")));
Hope that helps someone. :D
The original question was asked before a number of helpful features were added to the Open XML SDK. Nowadays, if you already have an opened WordprocessingDocument
, you would simply clone the original document and perform whatever transformation on that clone.
// Say you have done this somewhere before you want to duplicate your document.
using WordprocessingDocument originalDoc = WordprocessingDocument.Open("original.docx", false);
// Then this is how you can clone the opened WordprocessingDocument.
using var newDoc = (WordprocessingDocument) originalDoc.Clone("copy.docx", true);
// Perform whatever transformation you want to do.
PerformTransformation(newDoc);
You can also clone on a Stream
or Package
. Overall, you have the following options:
OpenXmlPackage Clone()
OpenXmlPackage Clone(Stream stream)
OpenXmlPackage Clone(Stream stream, bool isEditable)
OpenXmlPackage Clone(Stream stream, bool isEditable, OpenSettings openSettings)
OpenXmlPackage Clone(string path)
OpenXmlPackage Clone(string path, bool isEditable)
OpenXmlPackage Clone(string path, bool isEditable, OpenSettings openSettings)
OpenXmlPackage Clone(Package package)
OpenXmlPackage Clone(Package package, OpenSettings openSettings)
Have a look at the Open XML SDK documentation for details on those methods.
Having said that, if you have not yet opened the WordprocessingDocument
, there are at least faster ways to duplicate, or clone, the document. I've demonstrated this in my answer on the most efficient way to clone Office Open XML documents.
This piece of code should copy all parts from an existing document to a new one.
using (var mainDoc = WordprocessingDocument.Open(@"c:\sourcedoc.docx", false))
using (var resultDoc = WordprocessingDocument.Create(@"c:\newdoc.docx",
WordprocessingDocumentType.Document))
{
// copy parts from source document to new document
foreach (var part in mainDoc.Parts)
resultDoc.AddPart(part.OpenXmlPart, part.RelationshipId);
// perform replacements in resultDoc.MainDocumentPart
// ...
}
As an addenda to the above; what's perhaps more useful is finding content controls that have been tagged (using the word GUI). I recently wrote some software that populated document templates that contained content controls with tags attached. To find them is just an extension of the above LINQ query:
var mainDocument = doc.MainDocumentPart.Document;
var taggedContentControls = from sdt in mainDocument.Descendants<SdtElement>()
let sdtPr = sdt.GetFirstChild<SdtProperties>()
let tag = (sdtPr == null ? null : sdtPr.GetFirstChild<Tag>())
where (tag != null)
select new
{
SdtElem = sdt,
TagName = tag.GetAttribute("val", W).Value
};
I got this code from elsewhere but cannot remember where at the moment; full credit goes to them.
The query just creates an IEnumerable of an anonymous type that contains the content control and its associated tag as properties. Handy!
I have done some very similar things, but instead of using text substitution strings, I use Word Content Controls. I have documented some of the details in the following blog post, SharePoint and Open Xml. The technique is not specific to SharePoint. You could reuse the pattern in pure ASP.NET or other applications.
Also, I would STRONGLY encourage you to review Eric White's Blog for tips, tricks and techniques regarding Open Xml. Specifically, check out the in-memory manipulation of Open Xml post, and the Word content controls posts. I think you'll find these much more helpful in the long run.
Hope this helps.
When you look at an openxml document by changing the extension to zip and opening it you see that that word subfolder contains a _rels folder where all the relations are listed. These relations point to the parts you mentioned (style ...). Actually you need these parts because they contain the definition of the formatting. So not copying them will cause the new document to use the formatting defined in the normal.dot file and not the one defined in the original document. So I think you have to copy them.