I am a beginner to Java and my first task is to parse some 10,000 URLs and extract some info out of it, for this I am using Jsoup and it\'s working fine.
You don't have to get the webpage data through Jsoup. Here's my solution, it may not be the best though.
URL url = new URL("http://www.example.com/");
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress("127.0.0.1", 8080)); // or whatever your proxy is
HttpURLConnection uc = (HttpURLConnection)url.openConnection(proxy);
uc.connect();
String line = null;
StringBuffer tmp = new StringBuffer();
BufferedReader in = new BufferedReader(new InputStreamReader(uc.getInputStream()));
while ((line = in.readLine()) != null) {
tmp.append(line);
}
Document doc = Jsoup.parse(String.valueOf(tmp));
And there it is. This gets the source of the html page through a proxy and then parses it with Jsoup.