How to extract the contents of an OLE container?

蹲街弑〆低调 提交于 2020-01-13 10:43:27

问题


I need to break open a MS Word file (.doc) and extract its constituent files ('[1]CompObj', 'WordDocument' etc). Something like 7-zip can be used to do this manually but I need to do this programatically.

I've gathered that a Word document is an OLE container (hence why 7-zip can be used to view its contents) but I can't work out how to (using C++):

  1. open the OLE container
  2. extract each constituent file and save it to disk

I've found a couple of examples of OLE automation (eg here) but what I want to do seems to be less common and I've found no specific examples.

If anyone has any idea of either an API (?!) and tutorial for working with OLE I'd be grateful. Ditto any code samples.


回答1:


It is called Compound Files, part of the Structured Storage API. You start with StgOpenStorageEx(). It buys you little for a Word .doc file, the streams themselves have a sophisticated binary format. To really read the document content you want to use automation, letting Word read the file. That's rarely done in C++ but that project shows you how.




回答2:


This site http://www.endurasoft.com/vcd/ststo.htm contains both tutorial, API information and code sample that does everything I was looking for.



来源:https://stackoverflow.com/questions/3141902/how-to-extract-the-contents-of-an-ole-container

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!