How to scrape a website thet has username and password?

你说的曾经没有我的故事 提交于 2021-02-11 12:49:43

问题


I am trying to scrape a website and make my program know all the buttons and links that inside of that website but my problem is that to get to the first page I need to enter a username and a password and then scraping the page that shows after that and every time it's scraping to the page with the password and the username someone knows how to do that? because I don't know-how this is the code that I tried:

import requests
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://www.ronitnisan.co.il/admin/UnPermissionPage.asp?isiframe=")
try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.NAME, "FirstName"))
    )
except:
    driver.quit()
userName = driver.find_element_by_name("FirstName")
userName.clear()
userName.send_keys("username")
password = driver.find_element_by_name("UserIDNumber")
password.clear()
password.send_keys("username")
time.sleep(0.5)
login = driver.find_element_by_name("submit")
login.click()
URL = 'https://www.ronitnisan.co.il/admin/UnPermissionPage.asp?isiframe='
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
print(soup)

回答1:


You are starting a Chrome 'session' (I don't know if that is the correct word for it) up until and including the try: code block. You use that session to enter the username and password, so far so good.

Then you abandon that session altogether and just use a requests.get() statement to get an URL. That url does not have any login information (either via cookies of sessions via your browser, as the login was done via the driver variable.

The human equivalent of this is login into a website with Firefox and then try to visit the same website with Edge. They won't share the same session and you will have to login again in Edge in that case.

What you might want to try is something like this (after login.click())

soup = BeautifulSoup(driver.page_source, 'lxml')
print(soup)



回答2:


Replace

URL = 'https://www.ronitnisan.co.il/admin/UnPermissionPage.asp?isiframe='
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
print(soup)

with

driver.get (URL)

and then use find_element to track down the parts of the page you are interested in.

Otherwise you want to capture the cookies to use with requests.



来源:https://stackoverflow.com/questions/66054454/how-to-scrape-a-website-thet-has-username-and-password

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!