Creating bytesIO object

后端 未结 1 341
忘了有多久
忘了有多久 2021-01-24 23:18

I am working on a Scrapy spider, trying to extract the text from multiple PDF files in a directory, using slate. I have no interest in saving the actual PDF to disk, and so I\'v

相关标签:
1条回答
  • 2021-01-25 00:02

    When you do in_memory_pdf.read(response.body) you are supposed to pass the number of bytes to read. You want to initialize the buffer, not read into it.

    In python 2, just initialize BytesIO as:

     in_memory_pdf = BytesIO(response.body)
    

    In Python 3, you cannot use BytesIO with a string because it expects bytes. The error message shows that response.body is of type str: we have to encode it.

     in_memory_pdf = BytesIO(bytes(response.body,'ascii'))
    

    But as a pdf can be binary data, I suppose that response.body would be bytes, not str. In that case, the simple in_memory_pdf = BytesIO(response.body) works.

    0 讨论(0)
提交回复
热议问题