Read Docx files via python

孤人 提交于 2019-12-12 03:08:00

问题


Does anyone know a python library to read docx files?

I have a word document that I am trying to read data from.


回答1:


A quick search of PyPI turns up the docx package.




回答2:


python-docx can read as well as write.

doc = docx.Document('myfile.docx')
allText = []
for docpara in doc.paragraphs:
    allText.append(docpara.text)

Now all paragraphs will be in the list allText.

Thanks to "How to Automate the Boring Stuff with Python" by Al Sweigart for the pointer.




回答3:


import docx

def main():
    try:
        doc = docx.Document('test.docx')  # Creating word reader object.
        data = ""
        fullText = []
        for para in doc.paragraphs:
            fullText.append(para.text)
            data = '\n'.join(fullText)

        print(data)

    except IOError:
        print('There was an error opening the file!')
        return


if __name__ == '__main__':
    main()

and dont forget to install python-docx using (pip install python-docx)



来源:https://stackoverflow.com/questions/29309085/read-docx-files-via-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!