Cookie to log in with Jsoup?

后端 未结 3 1780
忘掉有多难
忘掉有多难 2021-01-15 02:24

For a project I\'m trying to get data from a website only acessible when you\'re logged in from the site Goodreads.com. I\'m new to Jsoup, since I\'m using it only for this

相关标签:
3条回答
  • 2021-01-15 02:43

    You can log in with this code:

    public static void main(String[] args) throws Exception {
    
        Connection.Response execute = Jsoup
                .connect("https://www.goodreads.com/")
                .method(Connection.Method.GET).execute();
    
        Element sign_in = execute.parse().getElementById("sign_in");
        String authenticityToken = sign_in.select("input[name=authenticity_token]").first().val();
        String n = sign_in.select("input[name=n]").first().val();
    
        Document document = Jsoup.connect("https://www.goodreads.com/user/sign_in")
                .data("cookieexists", "✓")
                .data("authenticity_token", authenticityToken)
                .data("user[email]", "user@email.com")
                .data("user[password]", "password")
                .data("remember_me", "on")
                .data("n", n)
                .cookies(execute.cookies())
                .post();
    
    }
    
    0 讨论(0)
  • 2021-01-15 02:53
    1. Goodreads requires two things when logging in: first, that you have a session ID stored in a cookie, and second, that you have a random generated number. You can get these when first visiting the login page without logging in: it will set a cookie with a session ID, and the form will contain a hidden input form (i.e. ) with the name "n" and value a number. Save these and pass them along as respectively a cookie and a form value when logging in.

    Some remarks about the way I found this out:

    The first thing you need to realise is that you're trying to recreate the exact same requests your browser does with Jsoup. So, in order to check whether what you have right now will work, you can try to recreate the exact same situation with your browser.

    To recreate your code, I went to the login page, then I deleted all my Goodreads cookies (as you don't send along any cookies when you send the login request as well), and attempted to sign in with only passing the username and password form values. It gave an error that my session had timd out. When I first loaded the login page and then deleted all cookies except the session ID and did not remove the "n" form value, I could log in successfully. Therefore, you want to make a general GET request to the sign in page first, retrieve the session ID cookie you get there and the hidden form value, and pass it along with the POST request.

    1. It could be that the API changed or that there just are several ways. Using Connection.Method.POST will do fine, in any case.

    2. Yes, they refer to the names of the input boxes. This should be id, however, since name was used in the past and not all versions of all browsers supported passing the ids as data, most websites are just adding both. Either should be fine.

    3. If you look at the source code of the sign in form, you can see that the "method" attribute of the form element is indeed the sign in page itself, so that's where it sends the request to.

    PS. As a general tip, you can use the Firefox extension "Tamper Data" to remove form data or even cookies (though there are easier extensions for that).

    0 讨论(0)
  • 2021-01-15 03:05
    1. See carefully what data is posted on login:

      user[email]:email@email

      remember_me:on

      user[password]:plain_pasword

      n:667387

    So your post must execute exact same keys.

    2.Make sure, you make right import: import org.jsoup.Connection.Method; but Connection.Method.POST is still good.

    3.See p1

    4.Yes, you are correct

    5.what is the question?

    0 讨论(0)
提交回复
热议问题