How to read metadata information from docx documents?

后端 未结 3 1904
执念已碎
执念已碎 2021-01-01 01:46

what I need to achieve is to have a word document template(docx), which will contain Title, Author name, Date, etc.

This template then will be used by users to comp

相关标签:
3条回答
  • 2021-01-01 02:22

    1) how do I put the metadata into the template saying: this is Title, this is Date, this is Name, etc? (not programatically)

    You could do that on Info tab in MS Word 2010 as shown below:

    How to set manually the MS Word document properties like Author, Title, etc...

    2) how do I programmatically read that information?

    Once you created your document (or template) you could always look inside it with Open XML SDK 2.0 Productivity Tool (wich is installed with OpenXML SDK) to see where (what classes to use) to get/set some information from/to document.

    Open XML SDK 2.0 Productivity Tool

    Also I think this post might help you to solve your task: Add and update custom document properties in a docx


    UPDATE:

    Hi Dave,

    Please have a look at this MSDN Article - Retrieving Application Properties from Word 2010 Documents by Using the Open XML SDK 2.0

    Hope this is exactly what you are looking for.

    0 讨论(0)
  • 2021-01-01 02:42

    All OpenXML documents have built in core Metadata that will do what you need through System.IO.Packaging. Once you open the word file using the open xml sdk in c#, you can get to these values via the PackageProperties class. There are 11 Properties you can use.

    You "encourage" your user to enter the metadata using Word's Document Information Panel (DIP).

    enter image description here You can force this on by default when they open your template, by a setting in the Developer Toolbar for the template. See the following article on how to set this in your template.

    I wrote a quick Windows Form app that displays this information using open xml sdk call to the PackageProperties of the Word file that is displayed above.

    enter image description here

    Here is the full solution with the sample word file included.

    Hope this helps.

    0 讨论(0)
  • 2021-01-01 02:46

    One way to approach this would be to use Content Controls. In Office, you can create your template, and then for each of your respective inputs of interest you can place one of these controls. They're under the Developer tab in Office.

    After inserting your controls you'll need for each of them to have a unique name. Office will let them all have the same name, but you'll need to uniquely identify all of them in your template document.

    You now need to get the data that's input in to these controls. Again, there's likely to be some better solutions but Eric White has all kinds of great OpenXML stuff, and so here's one of his: Iterating over Content Controls

    I think there's problems with finding content controls nested within a table. So, if you do that, then I think you have to specifically loop over the elements of the table to find content controls within.

    Also, you're probably going to want to save a .docx from your .doct file, which I don't think there's any built-in "one-liner" method in OpenXML; however, you can create a new Word document, and then write the file stream of the template in to the newly created docx file. Again, of course, there may be better solutions out there.

    Have you been here? There's lots of good stuff: Introduction to OpenXML

    Additionally, Eric has been releasing more and more videos on the OpenXML YouTube channel

    0 讨论(0)
提交回复
热议问题