Parse HTML in Android

前端 未结 5 794
逝去的感伤
逝去的感伤 2020-11-22 06:22

I am trying to parse HTML in android from a webpage, and since the webpage it not well formed, I get SAXException.

Is there a way to parse HTML in Andro

5条回答
  •  -上瘾入骨i
    2020-11-22 06:41

    I just encountered this problem. I tried a few things, but settled on using JSoup. The jar is about 132k, which is a bit big, but if you download the source and take out some of the methods you will not be using, then it is not as big.
    => Good thing about it is that it will handle badly formed HTML

    Here's a good example from their site.

    File input = new File("/tmp/input.html");
    Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
    
    //http://jsoup.org/cookbook/input/load-document-from-url
    //Document doc = Jsoup.connect("http://example.com/").get();
    
    Element content = doc.getElementById("content");
    Elements links = content.getElementsByTag("a");
    for (Element link : links) {
      String linkHref = link.attr("href");
      String linkText = link.text();
    }
    

提交回复
热议问题