How can I count words in complex documents (.rtf, .doc, .odt, etc)?
I'm trying to write a Python function that, given the path to a document file, returns the number of words in that document. This is fairly easy to do with .txt files, and there are tools that allow me to hack support for a few more complex document formats together, but I want a really comprehensive solution. Looking at OpenOffice.org's py-uno scripting interface and list of supported formats, it would seem ideal to load the documents in a headless OOo and call its word-count function. However, I can't find any py-uno tutorials or sample code that go beyond basic document generation, and even