Convert io.BytesIO to io.StringIO to parse HTML page

后端 未结 2 834
南笙
南笙 2020-12-29 22:39

I\'m trying to parse a HTML page I retrieved through pyCurl but the pyCurl WRITEFUNCTION is returning the page as BYTES and not string, so I\'m unable to Parse it using Bea

相关标签:
2条回答
  • 2020-12-29 22:59

    A naive approach:

    # assume bytes_io is a `BytesIO` object
    byte_str = bytes_io.read()
    
    # Convert to a "unicode" object
    text_obj = byte_str.decode('UTF-8')  # Or use the encoding you expect
    
    # Use text_obj how you see fit!
    # io.StringIO(text_obj) will get you to a StringIO object if that's what you need
    
    0 讨论(0)
  • 2020-12-29 23:22

    the code in the accepted answer actually reads from the stream completely for decoding. Below is the right way, converting one stream to another, where the data can be read chunk by chunk.

    # Initialize a read buffer
    input = io.BytesIO(
        b'Inital value for read buffer with unicode characters ' +
        'ÁÇÊ'.encode('utf-8')
    )
    wrapper = io.TextIOWrapper(input, encoding='utf-8')
    
    # Read from the buffer
    print(wrapper.read())
    
    0 讨论(0)
提交回复
热议问题