Compare XML snippets?

后端 未结 10 864
名媛妹妹
名媛妹妹 2020-11-30 03:42

Building on another SO question, how can one check whether two well-formed XML snippets are semantically equal. All I need is \"equal\" or not, since I\'m using this for un

相关标签:
10条回答
  • 2020-11-30 04:20

    SimpleTAL uses a custom xml.sax handler to compare xml-documents https://github.com/janbrohl/SimpleTAL/blob/python2/tests/TALTests/XMLTests/TALAttributeTestCases.py#L47-L112 (the results for getXMLChecksum are compared) but I prefer generating a list instead of a md5-hash

    0 讨论(0)
  • 2020-11-30 04:20

    What about the following code snippet ? Can be easily enhanced to include attribs as well :

    def separator(self):
        return "!@#$%^&*" # Very ugly separator
    
    def _traverseXML(self, xmlElem, tags, xpaths):
        tags.append(xmlElem.tag)
        for e in xmlElem:
            self._traverseXML(e, tags, xpaths)
    
        text = ''
        if (xmlElem.text):
            text = xmlElem.text.strip()
    
        xpaths.add("/".join(tags) + self.separator() + text)
        tags.pop()
    
    def _xmlToSet(self, xml):
        xpaths = set() # output
        tags = list()
        root = ET.fromstring(xml)
        self._traverseXML(root, tags, xpaths)
    
        return xpaths
    
    def _areXMLsAlike(self, xml1, xml2):
        xpaths1 = self._xmlToSet(xml1)
        xpaths2 = self._xmlToSet(xml2)`enter code here`
    
        return xpaths1 == xpaths2
    
    0 讨论(0)
  • 2020-11-30 04:25

    You can use formencode.doctest_xml_compare -- the xml_compare function compares two ElementTree or lxml trees.

    0 讨论(0)
  • 2020-11-30 04:31

    If you take a DOM approach, you can traverse the two trees simultaneously while comparing nodes (node type, text, attributes) as you go.

    A recursive solution will be the most elegant - just short-circuit further comparison once a pair of nodes are not "equal" or once you detect a leaf in one tree when it's a branch in another, etc.

    0 讨论(0)
  • 2020-11-30 04:34

    Since the order of attributes is not significant in XML, you want to ignore differences due to different attribute orderings and XML canonicalization (C14N) deterministically orders attributes, you can that method for testing equality:

    xml1 = b'''    <?xml version='1.0' encoding='utf-8' standalone='yes'?>
        <Stats start="1275955200" end="1276041599"></Stats>'''
    xml2 = b'''     <?xml version='1.0' encoding='utf-8' standalone='yes'?>
        <Stats end="1276041599" start="1275955200"></Stats>'''
    xml3 = b''' <?xml version='1.0' encoding='utf-8' standalone='yes'?>
        <Stats start="1275955200"></Stats>'''
    
    import lxml.etree
    
    tree1 = lxml.etree.fromstring(xml1.strip())
    tree2 = lxml.etree.fromstring(xml2.strip())
    tree3 = lxml.etree.fromstring(xml3.strip())
    
    import io
    
    b1 = io.BytesIO()
    b2 = io.BytesIO()
    b3 = io.BytesIO()
    
    tree1.getroottree().write_c14n(b1)
    tree2.getroottree().write_c14n(b2)
    tree3.getroottree().write_c14n(b3)
    
    assert b1.getvalue() == b2.getvalue()
    assert b1.getvalue() != b3.getvalue()
    

    Note that this example assumes Python 3. With Python 3, the use of b'''...''' strings and io.BytesIO is mandatory, while with Python 2 this method also works with normal strings and io.StringIO.

    0 讨论(0)
  • 2020-11-30 04:35

    The order of the elements can be significant in XML, this may be why most other methods suggested will compare unequal if the order is different... even if the elements have same attributes and text content.

    But I also wanted an order-insensitive comparison, so I came up with this:

    from lxml import etree
    import xmltodict  # pip install xmltodict
    
    
    def normalise_dict(d):
        """
        Recursively convert dict-like object (eg OrderedDict) into plain dict.
        Sorts list values.
        """
        out = {}
        for k, v in dict(d).iteritems():
            if hasattr(v, 'iteritems'):
                out[k] = normalise_dict(v)
            elif isinstance(v, list):
                out[k] = []
                for item in sorted(v):
                    if hasattr(item, 'iteritems'):
                        out[k].append(normalise_dict(item))
                    else:
                        out[k].append(item)
            else:
                out[k] = v
        return out
    
    
    def xml_compare(a, b):
        """
        Compares two XML documents (as string or etree)
    
        Does not care about element order
        """
        if not isinstance(a, basestring):
            a = etree.tostring(a)
        if not isinstance(b, basestring):
            b = etree.tostring(b)
        a = normalise_dict(xmltodict.parse(a))
        b = normalise_dict(xmltodict.parse(b))
        return a == b
    
    0 讨论(0)
提交回复
热议问题