I can read all xmls files that starts with but I can not read the files starts with
I have the exact same problem. My workaround is to not read the xml declaration:
with open('tests/xml-iso.xml', 'r', encoding='iso-8859-1') as f_in:
f_in.readline() # skipping header and letting soup create its own header
xml_soup = Soup(f_in.read(), 'xml', from_encoding='ISO-8859-1')
Coincidentally I stumbled upon another workaround. Read the file in binary mode ('rb'
):
with open('tests/xml-iso.xml', 'rb') as f_in:
xml_soup = Soup(f_in.read(), 'xml')