How can I log in to morningstar.com without using a headless browser such as selenium?

问题

I read the answer to the question: "How to “log in” to a website using Python's Requests module?"

The answer reads: "Firstly check the source of the login form to get three pieces of information - the url that the form posts to, and the name attributes of the username and password fields."

How can I see, what the name attributes for username and password are for this morningstar.com page? https://www.morningstar.com/members/login.html

I have the following code:

import requests

url = 'http://www.morningstar.com/members/login.html'
url = 'http://beta.morningstar.com'

with open('morningstar.txt') as f:
    username, password = f.read().splitlines()

with requests.Session() as s:
    payload = login_data = {
        'username': username,
        'password': password,
        }
    p = s.post(url, data=login_data)
    print(p.text)

But - among other things - it prints:

This distribution is not configured to allow the HTTP request method that was used for this request. The distribution supports only cachable requests.

What should url and data be for the post?

There is another answer, which makes use of selenium, but is it possible to avoid that?

回答1:

This was kind of hard, i had to use an intercepting proxy, but here it is:

import requests

s = requests.session()
auth_url = 'https://sso.morningstar.com/sso/json/msusers/authenticate'
login_url = 'https://www.morningstar.com/api/v2/user/login'
username = 'username'
password = 'password'

headers = {
    'Access-Control-Request-Method': 'POST',
    'Access-Control-Request-Headers': 'content-type,x-openam-password,x-openam-username',
    'Origin': 'https://www.morningstar.com'
}
s.options(auth_url, headers=headers)

headers = {
    'Referer': 'https://www.morningstar.com/members/login.html',
    'Content-Type': 'application/json',
    'X-OpenAM-Username': username,
    'X-OpenAM-Password': password,
    'Origin': 'https://www.morningstar.com',
}
s.post(auth_url, headers=headers)

data = {"productCode":"DOT_COM","rememberMe":False}
r = s.post(login_url, json=data)

print(s.cookies)
print(r.json())

By now you should have an authenticated session. You should see a bunch of cookies in s.cookies and some basic info about your account in r.json().

The site changed the login mechanism (and probably their entire CMS), so the above code doesn't work any more. The new login process involves one POST and one PATCH request to /umapi/v1/sessions, then a GET request to /umapi/v1/users.

import requests

sessions_url = 'https://www.morningstar.com/umapi/v1/sessions'
users_url = 'https://www.morningstar.com/umapi/v1/users'

userName = 'my email'
password = 'my pwd'
data = {'userName':userName,'password':password}

with requests.session() as s:
    r = s.post(sessions_url, json=data)
    # The response should be 200 if creds are valid, 401 if not
    assert r.status_code == 200
    s.patch(sessions_url)
    r = s.get(users_url)
    #print(r.json()) # contains account details

The URLs and other required values, such as POST data, can be obtained from the developer console (Ctrl+Shift+I) of a web-browser, under the Network tab.

回答2:

As seen the code, the username input field is:

<input id="uim-uEmail-input" name="uEmail" placeholder="E-mail Address" data-msat="formField-inputemailuEmail-login" type="email">

the password input field is:

<input id="uim-uPassword-input" name="uPassword" placeholder="Password" data-msat="formField-inputpassworduPassword-login" type="password">

The name is listed for both in each line after name=:

Username: "uEmail" Password: "uPassword"

来源：https://stackoverflow.com/questions/48228739/how-can-i-log-in-to-morningstar-com-without-using-a-headless-browser-such-as-sel

标签

python

python-3.x

post

get

python-requests