问题
I am trying to use Selenium in Python to pull some data from https://www.seekingalpha.com. The front page has a "Sign-in/Join now" link. I used Selenium to click it, which brought up a popup asking for sign-in information with another "Sign in" button. It seems my code below can enter my username and password, but my attempt to click the "sign in" button didn't get the right response (it clicked on the ad below the popup box.)
I am using Python 3.5.
Here is my code:
driver = webdriver.Chrome()
url = "https://seekingalpha.com"
driver.get(url)
sleep(5)
driver.find_element_by_xpath('//*[@id ="sign-in"]').click()
sleep(5)
driver.find_element_by_xpath('//*[@id ="authentication_login_email"]').send_keys("xxxx@gmail.com")
driver.find_element_by_xpath('//*[@id ="authentication_login_password"]').send_keys("xxxxxxxxx")
driver.find_element_by_xpath('//*[@id="log-btn"]').click()
Any advice/suggestion is greatly appreciated.
回答1:
EDIT: previous 'answer' was wrong so I have updated it.
Got you man, this is what you need to do:
1.) grab the latest firefox
2.) grab the latest geckodriver
3.) use a firefox driver
driver = webdriver.Firefox(executable_path=r'd:\Python_projects\geckodriver.exe')
url = "https://seekingalpha.com"
driver.get(url)
sign_in = driver.find_element_by_xpath('//*[@id ="sign-in"]')
driver.execute_script('arguments[0].click()', sign_in)
time.sleep(1)
email = driver.find_element_by_xpath('//*[@id ="authentication_login_email"]')
email.send_keys("xxxx@gmail.com")
pw = driver.find_element_by_xpath('//*[@id ="authentication_login_password"]')
pw.send_keys("xxxxxxxxx")
pw.send_keys(Keys.ENTER)
Explanation:
It is easy to detect if selenium is used or not if the browser tells that information (and it seems this page does not want to be scraped):
The webdriver read-only property of the navigator interface indicates whether the user agent is controlled by automation.
I have looked for an answer how to bypass detection and found this article.
Your best of avoiding detection when using Selenium would require you to use one of the latest builds of Firefox which don’t appear to give off any obvious sign that you are using Firefox.
Gave a shot and after launch the correct page design loaded and the login attempt resulted the same like the manual attempt.
Also with a bit more searching found that if you modify your chromedriver, you have a chance to bypass detection even with chromedriver.
Learned something new today too. \o/
An additional idea:
I have made a little experiment using embedded chromium (CEF). If you open a chrome window via selenium and you open the console and check navigator.webdriver
the result will be True
. If you open a CEF window however and then remote debug it, the flag will be False
. I did not check edge cases with it but non-edge-case scenarios should be fine with CEF.
So what you may want to check out in the future:
1.) in command line: pip install cefpython3
2.) git clone https://github.com/cztomczak/cefpython.git
3.) open your CEF project and find hello.py
in the examples
4.) update the startup to cef.Initialize(settings={"remote_debugging_port":9222})
5.) run hello.py
(this was the initial, one time setup, you may customize it in the future, but the main thing is done, you have a browser with a debug port open)
6.) modify chrome startup to:
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.debugger_address = "127.0.0.1:9222"
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=chrome_driver_executable)
7.) now you have a driver without 'automated' signature in the browser. There may be some drawbacks like:
- CEF is not super very latest, right now the latest released chrome is v76, CEF is v66.
- also "some stuff" may not work, like
window.Notification
is not a thing in CEF
回答2:
I tried code you provided and it works fine. i added selenium wait just to check other options and those also worked well i changed 2 lines instead of sleeps
driver.get(url)
wait = WebDriverWait(driver, 10)
signin = wait.until(EC.element_to_be_clickable((By.XPATH, "//*[@id ='sign-in']")))
#sleep(5)
signin.click()
#driver.find_element_by_xpath('//*[@id ="sign-in"]').click()
#sleep(5)
wait.until(EC.element_to_be_clickable((By.XPATH, "//*[@id ='authentication_login_email']")))
driver.find_element_by_xpath('//*[@id ="authentication_login_email"]').send_keys("xxxx@gmail.com")
and it does click on Sign in
button. and what i found is there is captcha handling on the site when i checked console after clicked on sign in button it tell the story. I went ahead and added user agent to your script but it did not worked as well. Notice the blockscript parameter in response of login API and console errors in below screenshots. However there is no captcha on the ui -
来源:https://stackoverflow.com/questions/57777773/how-to-use-selenium-to-click-a-button-in-a-popup-modal-box