BeautifulSoup does not parse xml with other encoding than utf-8

前端 未结 2 669
花落未央
花落未央 2021-01-19 18:14

I can read all xmls files that starts with but I can not read the files starts with

相关标签:
2条回答
  • 2021-01-19 18:37

    I have the exact same problem. My workaround is to not read the xml declaration:

    with open('tests/xml-iso.xml', 'r', encoding='iso-8859-1') as f_in:
        f_in.readline()  # skipping header and letting soup create its own header
        xml_soup = Soup(f_in.read(), 'xml', from_encoding='ISO-8859-1')
    
    0 讨论(0)
  • 2021-01-19 18:51

    Coincidentally I stumbled upon another workaround. Read the file in binary mode ('rb'):

    with open('tests/xml-iso.xml', 'rb') as f_in:
        xml_soup = Soup(f_in.read(), 'xml')
    
    0 讨论(0)
提交回复
热议问题