Retrieve attribute names and values with Python / lxml and XPath

后端 未结 2 834
青春惊慌失措
青春惊慌失措 2021-01-19 03:21

I am using XPath with Python lxml (Python 2). I run through two passes on the data, one to select the records of interest, and one to extract values from the data. Here is a

相关标签:
2条回答
  • 2021-01-19 04:10

    You should try following:

    for node in nodes:
        print node.attrib
    

    This will return dict of all attributes of node as {'id': '1', 'weight': '80', 'height': '160'}

    If you want to get something like [('@id', '1'), ('@height', '160'), ('@weight', '80')]:

    list_of_attributes = []
    for node in nodes:
        attrs = []
        for att in node.attrib:
            attrs.append(("@" + att, node.attrib[att]))
        list_of_attributes.append(attrs)
    

    Output:

    [[('@id', '1'), ('@height', '160'), ('@weight', '80')], [('@id', '2'), ('@weight', '70')], [('@id', '3'), ('@height', '140')]]
    
    0 讨论(0)
  • 2021-01-19 04:19

    I was wrong in my assertion that I was not going to use Python. I found that the lxml/etree implementation is easily extended to that I can use the XPath DSL with modifications.

    I registered the function "dictify". I changed the XPath expression to :

    dictify('@id|@height|@weight|weight|height')
    

    The new code is:

    from lxml import etree
    
    xml = """
    <records>
        <row id="1" height="160" weight="80" />
        <row id="2" weight="70" ><height>150</height></row>
        <row id="3" height="140" />
    </records>
    """
    
    def dictify(context, names):
        node = context.context_node
        rv = []
        rv.append('__dictify_start_marker__')
        names = names.split('|')
        for n in names:
            if n.startswith('@'):
                val =  node.attrib.get(n[1:])
                if val != None:
                    rv.append(n)
                    rv.append(val)
            else:
                children = node.findall(n)
                for child_node in children:
                    rv.append(n)
                    rv.append(child_node.text)
        rv.append('__dictify_end_marker__')
        return rv
    
    etree_functions = etree.FunctionNamespace(None)
    etree_functions['dictify'] = dictify
    
    
    parsed = etree.fromstring(xml)
    nodes = parsed.xpath('/records/row')
    for node in nodes:
        print node.xpath("dictify('@id|@height|@weight|weight|height')")
    

    This produces the following output:

    ['__dictify_start_marker__', '@id', '1', '@height', '160', '@weight', '80', '__dictify_end_marker__']
    ['__dictify_start_marker__', '@id', '2', '@weight', '70', 'height', '150', '__dictify_end_marker__']
    ['__dictify_start_marker__', '@id', '3', '@height', '140', '__dictify_end_marker__']
    
    0 讨论(0)
提交回复
热议问题