问题
I am using python 3. My code uses pdfminer to convert pdf to text. I want to get the output of these files in a new folder. Currently it's coming in the existing folder from which it does the conversion to .txt using pdfminer. How do I redirect the output to a different folder. I want the output in a folder called "D:\extracted_text" Code till now:
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from io import StringIO
import glob
import os
def convert(fname, pages=None):
if not pages:
pagenums = set()
else:
pagenums = set(pages)
output = StringIO()
manager = PDFResourceManager()
converter = TextConverter(manager, output, laparams=LAParams())
interpreter = PDFPageInterpreter(manager, converter)
infile = open(fname, 'rb')
for page in PDFPage.get_pages(infile, pagenums):
interpreter.process_page(page)
infile.close()
converter.close()
text = output.getvalue()
output.close
savepath = 'D:/extracted_text/'
outfile = os.path.splitext(fname)[0] + '.txt'
comp_name = os.path.join(savepath,outfile)
print(outfile)
with open(comp_name, 'w', encoding = 'utf-8') as pdf_file:
pdf_file.write(text)
return text
directory = glob.glob(r'D:\files\*.pdf')
for myfiles in directory:
convert(myfiles)
回答1:
you can use os.path,join, you have to give your directory path and filename with extension. it will create a full url and creates a file. You can use it like below
with open(os.path.join(dir_path,fileCompleteName), "w") as file1:
file1.write("Hello World")
In windows any of the below should work
"D:/extracted_text/"
os.path.join("/", "D:", "extracted_text", outfile)
os.path.join("D:/", "extracted_text", outfile)
Make sure directory path is exist "D:/extracted_text"
回答2:
The problem lies in line:
outfile = os.path.splitext(os.path.abspath(fname))[0] + '.txt'
If you print out outfile, you'll see that it contains the full path of your file. Replace it with:
outfile = os.path.splitext(fname)[0] + '.txt'
This should solve your problem! Note that this will break if 'D:/extracted_text/' does not exist. So either create that directory manually or programmatically using os.makedir
.
EDIT: To break down the problem into smaller pieces, open a new file and run this snippet, see if it does the trick, then make the changes in the original code:
import os
fname = "some_file.pdf"
text = "Here's the extracted text"
savepath = 'D:/extracted_text/'
outfile = os.path.splitext(fname)[0] + '.txt'
print(outfile)
comp_name = os.path.join(savepath,outfile)
print(comp_name)
with open(comp_name, 'w', encoding = 'utf-8') as pdf_file:
pdf_file.write(text)
来源:https://stackoverflow.com/questions/56482437/redirect-output-of-a-function-that-converts-pdf-to-txt-files-to-a-new-folder-in