How can I log in to morningstar.com without using a headless browser such as selenium?

允我心安 提交于 2020-01-10 03:16:39

问题


I read the answer to the question: "How to “log in” to a website using Python's Requests module?"

The answer reads: "Firstly check the source of the login form to get three pieces of information - the url that the form posts to, and the name attributes of the username and password fields."

How can I see, what the name attributes for username and password are for this morningstar.com page? https://www.morningstar.com/members/login.html

I have the following code:

import requests

url = 'http://www.morningstar.com/members/login.html'
url = 'http://beta.morningstar.com'

with open('morningstar.txt') as f:
    username, password = f.read().splitlines()

with requests.Session() as s:
    payload = login_data = {
        'username': username,
        'password': password,
        }
    p = s.post(url, data=login_data)
    print(p.text)

But - among other things - it prints:

This distribution is not configured to allow the HTTP request method that was used for this request. The distribution supports only cachable requests.

What should url and data be for the post?

There is another answer, which makes use of selenium, but is it possible to avoid that?


回答1:


This was kind of hard, i had to use an intercepting proxy, but here it is:

import requests

s = requests.session()
auth_url = 'https://sso.morningstar.com/sso/json/msusers/authenticate'
login_url = 'https://www.morningstar.com/api/v2/user/login'
username = 'username'
password = 'password'

headers = {
    'Access-Control-Request-Method': 'POST',
    'Access-Control-Request-Headers': 'content-type,x-openam-password,x-openam-username',
    'Origin': 'https://www.morningstar.com'
}
s.options(auth_url, headers=headers)

headers = {
    'Referer': 'https://www.morningstar.com/members/login.html',
    'Content-Type': 'application/json',
    'X-OpenAM-Username': username,
    'X-OpenAM-Password': password,
    'Origin': 'https://www.morningstar.com',
}
s.post(auth_url, headers=headers)

data = {"productCode":"DOT_COM","rememberMe":False}
r = s.post(login_url, json=data)

print(s.cookies)
print(r.json())

By now you should have an authenticated session. You should see a bunch of cookies in s.cookies and some basic info about your account in r.json().


The site changed the login mechanism (and probably their entire CMS), so the above code doesn't work any more. The new login process involves one POST and one PATCH request to /umapi/v1/sessions, then a GET request to /umapi/v1/users.

import requests

sessions_url = 'https://www.morningstar.com/umapi/v1/sessions'
users_url = 'https://www.morningstar.com/umapi/v1/users'

userName = 'my email'
password = 'my pwd'
data = {'userName':userName,'password':password}

with requests.session() as s:
    r = s.post(sessions_url, json=data)
    # The response should be 200 if creds are valid, 401 if not
    assert r.status_code == 200
    s.patch(sessions_url)
    r = s.get(users_url)
    #print(r.json()) # contains account details

The URLs and other required values, such as POST data, can be obtained from the developer console (Ctrl+Shift+I) of a web-browser, under the Network tab.




回答2:


As seen the code, the username input field is:

<input id="uim-uEmail-input" name="uEmail" placeholder="E-mail Address" data-msat="formField-inputemailuEmail-login" type="email">

the password input field is:

<input id="uim-uPassword-input" name="uPassword" placeholder="Password" data-msat="formField-inputpassworduPassword-login" type="password">

The name is listed for both in each line after name=:

Username: "uEmail" Password: "uPassword"



来源:https://stackoverflow.com/questions/48228739/how-can-i-log-in-to-morningstar-com-without-using-a-headless-browser-such-as-sel

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!