I am writing a small Java program to get the amount of results for a given Google search term. For some reason, in Java I am getting a 403 Forbidden but I am getting the rig
It's because the site uses SSL. Try using the Jersey HTTP Client. You will probably also have to learn a little about HTTPS and the certificates, but I think Jersey can bet set to ignore most of the details relating to the actual security.
You probably aren't setting the correct headers. Use LiveHttpHeaders
(or equivalent) in the browser to see what headers the browser is sending, then emulate them in your code.
For me it worked by adding the header: "Accept": "*/*"
You just need to set user agent header for it to work:
URLConnection connection = new URL("https://www.google.com/search?q=" + query).openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.connect();
BufferedReader r = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
StringBuilder sb = new StringBuilder();
String line;
while ((line = r.readLine()) != null) {
sb.append(line);
}
System.out.println(sb.toString());
The SSL was transparently handled for you as could be seen from your exception stacktrace.
Getting the result amount is not really this simple though, after this you have to fake that you're a browser by fetching the cookie and parsing the redirect token link.
String cookie = connection.getHeaderField( "Set-Cookie").split(";")[0];
Pattern pattern = Pattern.compile("content=\\\"0;url=(.*?)\\\"");
Matcher m = pattern.matcher(response);
if( m.find() ) {
String url = m.group(1);
connection = new URL(url).openConnection();
connection.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.95 Safari/537.11");
connection.setRequestProperty("Cookie", cookie );
connection.connect();
r = new BufferedReader(new InputStreamReader(connection.getInputStream(), Charset.forName("UTF-8")));
sb = new StringBuilder();
while ((line = r.readLine()) != null) {
sb.append(line);
}
response = sb.toString();
pattern = Pattern.compile("<div id=\"resultStats\">About ([0-9,]+) results</div>");
m = pattern.matcher(response);
if( m.find() ) {
long amount = Long.parseLong(m.group(1).replaceAll(",", ""));
return amount;
}
}
Running the full code I get 2930000000L
as a result.