BeautifulSoup raise AttributeError when xml tag name contains capital letters

前端 未结 2 873
再見小時候
再見小時候 2021-01-24 03:58

I\'m trying to get all the XML attributes for the tag Name.

Getting this error:

AttributeError: \'NoneType\' object has no attribute \'attr         


        
相关标签:
2条回答
  • 2021-01-24 04:38

    In BeautifulSoup 4, you can use

    doc = bs.BeautifulSoup(xml, "xml")
    div = doc.find("Name")
    

    This should work.

    0 讨论(0)
  • 2021-01-24 04:46

    BeautifulSoup is a HTML-parsing library, primarily. It can handle XML too, but all tags are lowercased as per the HTML specification. Quoting the BeautifulSoup documentation:

    Because HTML tags and attributes are case-insensitive, all three HTML parsers convert tag and attribute names to lowercase. That is, the markup <TAG></TAG> is converted to <tag></tag>. If you want to preserve mixed-case or uppercase tags and attributes, you’ll need to parse the document as XML.

    There is a XML modus where tags are matches case-sensitively and are not lowercased, but this requires the lxml library to be installed. Because lxml is a C-extension library, this is not supported on the Google App Engine.

    Use the ElementTree API instead:

    import xml.etree.ElementTree as ET
    
    root = ET.fromstring(xml)
    div = root.find('.//Name')
    
    for attr, val in div.items():
         print "%s:%s" % (attr, val)
    
    0 讨论(0)
提交回复
热议问题