问题
I wrote the following code to convert my docx file to text file. The output that I get printed in my text file is the last paragraph/part of the whole file and not the complete content. The code is as follows:
from docx import Document
import io
import shutil
def convertDocxToText(path):
for d in os.listdir(path):
fileExtension=d.split(".")[-1]
if fileExtension =="docx":
docxFilename = path + d
print(docxFilename)
document = Document(docxFilename)
# for printing the complete document
print('\nThe whole content of the document:->>>\n')
for para in document.paragraphs:
textFilename = path + d.split(".")[0] + ".txt"
with io.open(textFilename,"w", encoding="utf-8") as textFile:
#textFile.write(unicode(para.text))
x=unicode(para.text)
print(x) //the complete content gets printed by this line
textFile.write((x)) #after writing the content to text file only last paragraph is copied.
#textFile.write(para.text)
path= "/home/python/resumes/"
convertDocxToText(path)
回答1:
Problem
as your code says in the last for
loop:
for para in document.paragraphs:
textFilename = path + d.split(".")[0] + ".txt"
with io.open(textFilename,"w", encoding="utf-8") as textFile:
x=unicode(para.text)
textFile.write((x))
for each paragraph in whole document, you try to open a file named textFilename
so let's say you have a file named MyFile.docx
in /home/python/resumes/
so the textFilename
value that contains the path will be /home/python/resumes/MyFile.txt
always in whole of for
loop, so the problem is that you open the same file in w
mode which is a Write
mode, and will overwrite the whole file content.
Solution:
you must open the file once out of that for loop then try add paragraphs one by one to it.
回答2:
the following is the solution for the above problem:
from docx import Document
import io
import shutil
import os
def convertDocxToText(path):
for d in os.listdir(path):
fileExtension=d.split(".")[-1]
if fileExtension =="docx":
docxFilename = path + d
print(docxFilename)
document = Document(docxFilename)
textFilename = path + d.split(".")[0] + ".txt"
with io.open(textFilename,"w", encoding="utf-8") as textFile:
for para in document.paragraphs:
textFile.write(unicode(para.text))
path= "/home/python/resumes/"
convertDocxToText(path)
来源:https://stackoverflow.com/questions/52719258/docx-file-to-text-file-conversion-using-python