finding elements by attribute with lxml

前端 未结 2 827
轮回少年
轮回少年 2021-01-30 12:40

I need to parse a xml file to extract some data. I only need some elements with certain attributes, here\'s an example of document:


    

        
相关标签:
2条回答
  • 2021-01-30 13:10

    You can use xpath, e.g. root.xpath("//article[@type='news']")

    This xpath expression will return a list of all <article/> elements with "type" attributes with value "news". You can then iterate over it to do what you want, or pass it wherever.

    To get just the text content, you can extend the xpath like so:

    root = etree.fromstring("""
    <root>
        <articles>
            <article type="news">
                 <content>some text</content>
            </article>
            <article type="info">
                 <content>some text</content>
            </article>
            <article type="news">
                 <content>some text</content>
            </article>
        </articles>
    </root>
    """)
    
    print root.xpath("//article[@type='news']/content/text()")
    

    and this will output ['some text', 'some text']. Or if you just wanted the content elements, it would be "//article[@type='news']/content" -- and so on.

    0 讨论(0)
  • 2021-01-30 13:32

    Just for reference, you can achieve the same result with findall:

    root = etree.fromstring("""
    <root>
        <articles>
            <article type="news">
                 <content>some text</content>
            </article>
            <article type="info">
                 <content>some text</content>
            </article>
            <article type="news">
                 <content>some text</content>
            </article>
        </articles>
    </root>
    """)
    
    articles = root.find("articles")
    article_list = articles.findall("article[@type='news']/content")
    for a in article_list:
        print a.text
    
    0 讨论(0)
提交回复
热议问题