How to fix 403 response when using HttpURLConnection in Selenium since the links are opening manually without any issue

问题

I was checking the active links in a website with selenium web driver and java. I have passed the links to the array and while verifying I am getting the response as 403 forbidden for all links in the site. It is just a public website anyone can access. The links are working properly when clicking manually. I wanted to know Why it is not showing 200 and what can be done on this situation.

This is for Selenium webdriver with Java

for(int j=0;j< activelinks.size();j++) {
        System.out.println("Active Link address and status >>> " +  activelinks.get(j).getAttribute("href"));
        HttpURLConnection connection = (HttpURLConnection)new URL(activelinks.get(j).getAttribute("href")).openConnection();
        connection.connect();
        String response = connection.getResponseMessage();
        int responsecode = connection.getResponseCode();
        connection.disconnect();
        System.out.println(activelinks.get(j).getAttribute("href")+ ">>"+ response+ " " + responsecode);}

I expect the response code as 200, but the actual output is 403

回答1:

403 Forbidden

The HTTP 403 Forbidden client error status response code indicates that the server understood the request but refuses to authorize it.

This status is similar to 401, but in this case, re-authenticating will make no difference. The access is permanently forbidden and tied to the application logic, such as insufficient rights to a resource.

Reason

I don't see any such issue in your code block. However, there is a possibility that the WebDriver controlled Browser Client is getting detected and hence the subsequent requests are getting blocked and there can be numerous factors as follows:

User agent
Plugins
Languages
WebGL
Browser features
Missing image

You can find a couple of detailed discussion in:

How does recaptcha 3 know I'm using selenium/chromedriver?
Selenium and non-headless browser keeps asking for Captcha

Solution

A generic solution will be to use a proxy or rotating proxies from the Free Proxy List.

You can find a detailed discussion in Change proxy in chromedriver for scraping purposes

Outro

You can a couple relevant discussions in:

Can a website detect when you are using selenium with chromedriver?
Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
Failed to load resource: the server responded with a status of 429 (Too Many Requests) and 404 (Not Found) with ChromeDriver Chrome through Selenium

回答2:

I believe your need to add the relevant Cookies to the HTTPUrlConnection, or even better consider switching to OkHttp library which is under the hood of Selenium Java Client

So you basically need to fetch the cookies from the browser using driver.manage.getCookies() function and generate a proper Cookie request header for the subsequent calls.

Example code:

driver.manage().getCookies()
        .forEach(cookie -> cookieBuilder
                .append(cookie.getName())
                .append("=")
                .append(cookie.getValue())
                .append(";"));

OkHttpClient client = new OkHttpClient().newBuilder().build();

for (WebElement activelink : activelinks) {
    Request request = new Request.Builder()
            .url(activelink.getAttribute("href"))
            .addHeader("Cookie", cookieBuilder.toString())
            .build();
    Response urlResponse = client.newCall(request).execute();
    String response = urlResponse.message();
    int responsecode = urlResponse.code();
    System.out.println(activelink.getAttribute("href") + ">>" + response + " " + responsecode);
}

If you need nothing else but response code you can consider using HEAD method to avoid executing calls for the full URLs - this will allow you to save traffic and your test will be much faster.

来源：https://stackoverflow.com/questions/56668773/how-to-fix-403-response-when-using-httpurlconnection-in-selenium-since-the-links

标签

selenium

selenium-webdriver

httpresponse

httpconnection