问题
I am trying to scrape some text off of https://www.memrise.com/course/2021573/french-1-145/garden/speed_review/?source_element=ms_mode&source_screen=eos_ms
, but as you can see when it loads up the link through web-driver it automatically redirects it to a log in page. After I log in, it then goes straight to the page I want to scrape, but Beautiful Soup just keeps scraping the log in page.
How do I make it so Beautiful Soup scrapes the page I want it to and not the login page?
I have already tried putting a time.sleep()
before it scrapes to give me time to log in but that didn't work either.
soup = BeautifulSoup(requests.get("https://www.memrise.com/course/2021573/french-1-145/garden/speed_review/?source_element=ms_mode&source_screen=eos_ms").text, 'html.parser')
while True:
front_half = soup.find_all(class_='qquestion qtext')
print(front_half)
time.sleep(1)
回答1:
What you probably need is a persistent session with requests
. This answer probably covers exactly what you need. The general idea is simple:
- You open a session and send a request to the website
- Send the login post request so it logs you in
- Query the url with the same session.
You will need to understand how the login post request is structured and what data is passed (username, email, etc) and create a json
with that data.
import requests
url = 'https://www.memrise.com/course/2021573/french-1-145/garden/speed_review/?source_element=ms_mode&source_screen=eos_ms'
session = requests.session()
login_data = {
'username': ,
'csrfmiddlewaretoken': ,
'password': ,
'next': '/course/2021573/french-1-145/garden/speed_review/?source_element=ms_mode&source_screen=eos_ms'
}
session.get(url) #this will redirect you and it might load some initial cookies info
r = session.post('https://<theurl>/login.py', login_data)
if r.status_code == 200: #if accepted the request
res = session.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
## (...) your scraping code
来源:https://stackoverflow.com/questions/57273335/how-to-scrape-a-page-if-it-is-redirected-to-another-before