问题
Is it possible to read and write Word (2003 and 2007) files in Python without using a COM object?
I know that I can:
f = open('c:\file.doc', "w")
f.write(text)
f.close()
but Word will read it as an HTML file not a native .doc file.
回答1:
I'd look into IronPython which intrinsically has access to windows/office APIs because it runs on .NET runtime.
回答2:
See python-docx, its official documentation is available here.
This has worked very well for me.
回答3:
If you only what to read, it is simplest to use the linux soffice command to convert it to text, and then load the text into python:
回答4:
doc (Word 2003 in this case) and docx (Word 2007) are different formats, where the latter is usually just an archive of xml and image files. I would imagine that it is very possible to write to docx files by manipulating the contents of those xml files. However I don't see how you could read and write to a doc file without some type of COM component interface.
来源:https://stackoverflow.com/questions/188444/reading-writing-ms-word-files-in-python