问题
I am trying to scrape a website and make my program know all the buttons and links that inside of that website but my problem is that to get to the first page I need to enter a username and a password and then scraping the page that shows after that and every time it's scraping to the page with the password and the username someone knows how to do that? because I don't know-how this is the code that I tried:
import requests
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://www.ronitnisan.co.il/admin/UnPermissionPage.asp?isiframe=")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.NAME, "FirstName"))
)
except:
driver.quit()
userName = driver.find_element_by_name("FirstName")
userName.clear()
userName.send_keys("username")
password = driver.find_element_by_name("UserIDNumber")
password.clear()
password.send_keys("username")
time.sleep(0.5)
login = driver.find_element_by_name("submit")
login.click()
URL = 'https://www.ronitnisan.co.il/admin/UnPermissionPage.asp?isiframe='
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
print(soup)
回答1:
You are starting a Chrome 'session' (I don't know if that is the correct word for it) up until and including the try:
code block. You use that session to enter the username and password, so far so good.
Then you abandon that session altogether and just use a requests.get()
statement to get an URL. That url does not have any login information (either via cookies of sessions via your browser, as the login was done via the driver
variable.
The human equivalent of this is login into a website with Firefox and then try to visit the same website with Edge. They won't share the same session and you will have to login again in Edge in that case.
What you might want to try is something like this (after login.click()
)
soup = BeautifulSoup(driver.page_source, 'lxml')
print(soup)
回答2:
Replace
URL = 'https://www.ronitnisan.co.il/admin/UnPermissionPage.asp?isiframe='
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
print(soup)
with
driver.get (URL)
and then use find_element to track down the parts of the page you are interested in.
Otherwise you want to capture the cookies to use with requests.
来源:https://stackoverflow.com/questions/66054454/how-to-scrape-a-website-thet-has-username-and-password