Parse HTML in Android

前端 未结 5 786
逝去的感伤
逝去的感伤 2020-11-22 06:22

I am trying to parse HTML in android from a webpage, and since the webpage it not well formed, I get SAXException.

Is there a way to parse HTML in Andro

相关标签:
5条回答
  • 2020-11-22 06:41

    I just encountered this problem. I tried a few things, but settled on using JSoup. The jar is about 132k, which is a bit big, but if you download the source and take out some of the methods you will not be using, then it is not as big.
    => Good thing about it is that it will handle badly formed HTML

    Here's a good example from their site.

    File input = new File("/tmp/input.html");
    Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
    
    //http://jsoup.org/cookbook/input/load-document-from-url
    //Document doc = Jsoup.connect("http://example.com/").get();
    
    Element content = doc.getElementById("content");
    Elements links = content.getElementsByTag("a");
    for (Element link : links) {
      String linkHref = link.attr("href");
      String linkText = link.text();
    }
    
    0 讨论(0)
  • 2020-11-22 06:42
    String tmpHtml = "<html>a whole bunch of html stuff</html>";
    String htmlTextStr = Html.fromHtml(tmpHtml).toString();
    
    0 讨论(0)
  • 2020-11-22 06:45

    Have you tried using Html.fromHtml(source)?

    I think that class is pretty liberal with respect to source quality (it uses TagSoup internally, which was designed with real-life, bad HTML in mind). It doesn't support all HTML tags though, but it does come with a handler you can implement to react on tags it doesn't understand.

    0 讨论(0)
  • 2020-11-22 06:49

    We all know that programming have endless possibilities.There are numbers of solutions available for a single problem so i think all of the above solutions are perfect and may be helpful for someone but for me this one save my day..

    So Code goes like this

      private void getWebsite() {
        new Thread(new Runnable() {
          @Override
          public void run() {
            final StringBuilder builder = new StringBuilder();
    
            try {
              Document doc = Jsoup.connect("http://www.ssaurel.com/blog").get();
              String title = doc.title();
              Elements links = doc.select("a[href]");
    
              builder.append(title).append("\n");
    
              for (Element link : links) {
                builder.append("\n").append("Link : ").append(link.attr("href"))
                .append("\n").append("Text : ").append(link.text());
              }
            } catch (IOException e) {
              builder.append("Error : ").append(e.getMessage()).append("\n");
            }
    
            runOnUiThread(new Runnable() {
              @Override
              public void run() {
                result.setText(builder.toString());
              }
            });
          }
        }).start();
      }
    

    You just have to call the above function in onCreate Method of your MainActivity

    I hope this one is also helpful for you guys.

    Also read the original blog at Medium

    0 讨论(0)
  • 2020-11-22 06:49

    Maybe you can use WebView, but as you can see in the doc WebView doesn't support javascript and other stuff like widgets by default.

    http://developer.android.com/reference/android/webkit/WebView.html

    I think that you can enable javascript if you need it.

    0 讨论(0)
提交回复
热议问题