Open an Azure StorageStreamDownloader without saving it as a file

后端 未结 2 862
孤独总比滥情好
孤独总比滥情好 2021-01-16 06:44

I need to download a PDF from a blob container in azure as a download stream (StorageStreamDownloader) and open it in both PDFPlumber and PDFminer. I developed all the requi

相关标签:
2条回答
  • 2021-01-16 07:19

    download_blob() download the blob to a StorageStreamDownloader class, and in this class there is a download_to_stream, with this you will get the blob stream.

    from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
    from io import BytesIO
    import PyPDF2
    filename = "test.pdf"
    
    container_name="test"
    
    blob_service_client = BlobServiceClient.from_connection_string("connection string")
    container_client=blob_service_client.get_container_client(container_name)
    blob_client = container_client.get_blob_client(filename)
    streamdownloader=blob_client.download_blob()
    
    stream = BytesIO()
    streamdownloader.download_to_stream(stream)
    
    fileReader = PyPDF2.PdfFileReader(stream)
    
    print(fileReader.numPages)
    

    And this is my result. It will print the pdf pages number.

    0 讨论(0)
  • 2021-01-16 07:24

    It seems download_to_stream() is now deprecated and instead should be used readinto().

    from azure.storage.blob import BlobClient
    
    
    conn_string = ''
    container_name = ''
    blob_name = ''
    blob_obj = BlobClient.from_connection_string(
        conn_str=conn_string, container_name=container_name,
        blob_name=blob_name
    )
    with open(blob_name, 'wb') as f:
        b = blob_obj.download_blob()
        b.readinto(f)
    

    This will create a file in working directory with the data that was downloaded.

    0 讨论(0)
提交回复
热议问题