问题
I am trying extract data as HTML from pdf using pdfminer although I was successful to extract text from the same pdf now I am getting an error while extracting data as HTML I have to filter the data further to categorize it in CSV. This is the script.
from io import StringIO
from pdfminer.layout import LAParams
from pdfminer.high_level import extract_text_to_fp
output_string = StringIO
with open('mini.pdf','rb') as fn:
extract_text_to_fp(fn, output_string, laparams=LAParams(), output_type='html', codec=None)
And this is the error I am getting. Click Here
回答1:
This code work for me.
def convert_html(filename):
output = StringIO()
with open(filename, 'rb') as fin:
extract_text_to_fp(fin,output,laparams=LAParams(),output_type='html',
codec=None)
Out_txt=output.getvalue()
return Out_txt
来源:https://stackoverflow.com/questions/63523909/i-am-trying-to-extract-data-as-html-elements-in-python-using-pdfminer