Does jsoup support xpath?

后端 未结 3 1635
迷失自我
迷失自我 2020-12-08 19:24

There\'s some work in progress related to adding xpath support to jsoup https://github.com/jhy/jsoup/pull/80.

  • Is it working?
  • How can I use it?
相关标签:
3条回答
  • 2020-12-08 19:55

    JSoup doesn't support XPath yet, but you may try XSoup - "Jsoup with XPath".

    Here's an example quoted from the projects Github site (link):

    @Test
    public void testSelect() {
    
        String html = "<html><div><a href='https://github.com'>github.com</a></div>" +
                "<table><tr><td>a</td><td>b</td></tr></table></html>";
    
        Document document = Jsoup.parse(html);
    
        String result = Xsoup.compile("//a/@href").evaluate(document).get();
        Assert.assertEquals("https://github.com", result);
    
        List<String> list = Xsoup.compile("//tr/td/text()").evaluate(document).list();
        Assert.assertEquals("a", list.get(0));
        Assert.assertEquals("b", list.get(1));
    }
    

    There you'll also find a list of features and expressions of XPath that are supported by XSoup.

    0 讨论(0)
  • 2020-12-08 19:57

    Not yet,but the project JsoupXpath has make it.For example,

    String html = "<html><body><script>console.log('aaaaa')</script><div class='test'>some body</div><div class='xiao'>Two</div></body></html>";
    JXDocument underTest = JXDocument.create(html);
    String xpath = "//div[contains(@class,'xiao')]/text()";
    JXNode node = underTest.selNOne(xpath);
    Assert.assertEquals("Two",node.asString());
    

    By the way,it supports the complete W3C XPATH 1.0 standard syntax.Such as

    //ul[@class='subject-list']/li[./div/div/span[@class='pl']/num()>(1000+90*(2*50))][last()][1]/div/h2/allText()
    //ul[@class='subject-list']/li[not(contains(self::li/div/div/span[@class='pl']//text(),'14582'))]/div/h2//text()
    
    0 讨论(0)
  • 2020-12-08 20:22

    HtmlUnit supports XPath. I've used this for certain projects and it works reasonably well.

    0 讨论(0)
提交回复
热议问题