How can i read a PDF file from inline raw_bytes (not from file)?

前端未结

关注

 2  973

I am trying to create a pdf puller from the Australian Stock Exchange website which will allow me to search through all the \'Announcements\' made by companies and search fo

相关标签:

2条回答

梦谈多话

2020-12-10 14:47

you can use io

import requests, PyPDF2, io

url = 'http://www.asx.com.au/asxpdf/20171108/pdf/43p1l61zf2yct8.pdf'
response = requests.get(url)

with io.BytesIO(response.content) as open_pdf_file:
    read_pdf = PyPDF2.PdfFileReader(open_pdf_file)
    num_pages = read_pdf.getNumPages()
    print(num_pages)

PS. To open files, always use a context manager (with-statement)

0 讨论(0)

日久生厌

2020-12-10 14:59

Finally i got that!! Try This ( just with io ) :

import requests, PyPDF2, io


url = 'http://www.asx.com.au/asxpdf/20171103/pdf/43nyyw9r820c6r.pdf'
response = requests.get(url)
my_raw_data = response.content

pdf_content = io.BytesIO(my_raw_data)
pdf_reader = PyPDF2.PdfFileReader(pdf_content)

if pdf_reader.isEncrypted:
    pdf_reader.decrypt("")
    print(pdf_reader.getPage(0).extractText())

else:
    print(pdf_reader.getPage(0).extractText())

Good Luck ... :)

0 讨论(0)