How can i read a PDF file from inline raw_bytes (not from file)?

前端 未结 2 973
别那么骄傲
别那么骄傲 2020-12-10 14:08

I am trying to create a pdf puller from the Australian Stock Exchange website which will allow me to search through all the \'Announcements\' made by companies and search fo

相关标签:
2条回答
  • 2020-12-10 14:47

    you can use io

    import requests, PyPDF2, io
    
    url = 'http://www.asx.com.au/asxpdf/20171108/pdf/43p1l61zf2yct8.pdf'
    response = requests.get(url)
    
    with io.BytesIO(response.content) as open_pdf_file:
        read_pdf = PyPDF2.PdfFileReader(open_pdf_file)
        num_pages = read_pdf.getNumPages()
        print(num_pages)
    
    2
    

    PS. To open files, always use a context manager (with-statement)

    0 讨论(0)
  • 2020-12-10 14:59

    Finally i got that!! Try This ( just with io ) :

    import requests, PyPDF2, io
    
    
    url = 'http://www.asx.com.au/asxpdf/20171103/pdf/43nyyw9r820c6r.pdf'
    response = requests.get(url)
    my_raw_data = response.content
    
    pdf_content = io.BytesIO(my_raw_data)
    pdf_reader = PyPDF2.PdfFileReader(pdf_content)
    
    if pdf_reader.isEncrypted:
        pdf_reader.decrypt("")
        print(pdf_reader.getPage(0).extractText())
    
    else:
        print(pdf_reader.getPage(0).extractText())
    

    Good Luck ... :)

    0 讨论(0)
提交回复
热议问题