How to fetch HTML in Java

后端 未结 5 1086
不思量自难忘°
不思量自难忘° 2020-12-05 13:35

Without the use of any external library, what is the simplest way to fetch a website\'s HTML content into a String?

相关标签:
5条回答
  • 2020-12-05 13:59

    Its not library but a tool named curl generally installed in most of the servers or you can easily install in ubuntu by

    sudo apt install curl
    

    Then fetch any html page and store it to your local file like an example

    curl https://www.facebook.com/ > fb.html
    

    You will get the home page html.You can run it in your browser as well.

    0 讨论(0)
  • 2020-12-05 14:05

    I just left this post in your other thread, though what you have above might work as well. I don't think either would be any easier than the other. The Apache packages can be accessed by just using import org.apache.commons.HttpClient at the top of your code.

    Edit: Forgot the link ;)

    0 讨论(0)
  • 2020-12-05 14:13

    I'm currently using this:

    String content = null;
    URLConnection connection = null;
    try {
      connection =  new URL("http://www.google.com").openConnection();
      Scanner scanner = new Scanner(connection.getInputStream());
      scanner.useDelimiter("\\Z");
      content = scanner.next();
      scanner.close();
    }catch ( Exception ex ) {
        ex.printStackTrace();
    }
    System.out.println(content);
    

    But not sure if there's a better way.

    0 讨论(0)
  • 2020-12-05 14:17

    This has worked well for me:

    URL url = new URL(theURL);
    InputStream is = url.openStream();
    int ptr = 0;
    StringBuffer buffer = new StringBuffer();
    while ((ptr = is.read()) != -1) {
        buffer.append((char)ptr);
    }
    

    Not sure at to whether the other solution(s) provided are any more efficient or not.

    0 讨论(0)
  • 2020-12-05 14:18

    Whilst not vanilla-Java, I'll offer up a simpler solution. Use Groovy ;-)

    String siteContent = new URL("http://www.google.com").text
    
    0 讨论(0)
提交回复
热议问题