问题
I have created the following code in hopes to open up a new tab with a few parameters and then scrape the data table that is on the new tab.
#Open Webpage
url = "https://www.website.com"
driver=webdriver.Chrome(executable_path=r"C:\mypathto\chromedriver.exe")
driver.get(url)
#Click Necessary Parameters
driver.find_element_by_partial_link_text('Output').click()
driver.find_element_by_xpath('//*[@id="flexOpt"]/table/tbody/tr/td[2]/input[3]').click()
driver.find_element_by_xpath('//*[@id="flexOpt"]/table/tbody/tr/td[2]/input[4]').click()
driver.find_element_by_xpath('//*[@id="repOpt"]/table[2]/tbody/tr/td[2]/input[4]').click()
time.sleep(2)
driver.find_element_by_partial_link_text('Dates').click()
driver.find_element_by_xpath('//*[@id="RangeOption"]').click()
driver.find_element_by_xpath('//*[@id="Range"]/table/tbody/tr[1]/td[2]/select/option[2]').click()
driver.find_element_by_xpath('//*[@id="Range"]/table/tbody/tr[1]/td[3]/select/option[1]').click()
driver.find_element_by_xpath('//*[@id="Range"]/table/tbody/tr[1]/td[4]/select/option[1]').click()
driver.find_element_by_xpath('//*[@id="Range"]/table/tbody/tr[2]/td[2]/select/option[2]').click()
driver.find_element_by_xpath('//*[@id="Range"]/table/tbody/tr[2]/td[3]/select/option[31]').click()
driver.find_element_by_xpath('//*[@id="Range"]/table/tbody/tr[2]/td[4]/select/option[1]').click()
time.sleep(2)
driver.find_element_by_partial_link_text('Groupings').click()
driver.find_element_by_xpath('//*[@id="availFld_DATE"]/a/img').click()
driver.find_element_by_xpath('//*[@id="availFld_LOCID"]/a/img').click()
driver.find_element_by_xpath('//*[@id="availFld_STATE"]/a/img').click()
driver.find_element_by_xpath('//*[@id="availFld_DDSO_SA"]/a/img').click()
driver.find_element_by_xpath('//*[@id="availFld_CLASS_ID"]/a/img').click()
driver.find_element_by_xpath('//*[@id="availFld_REGION"]/a/img').click()
time.sleep(2)
driver.find_element_by_partial_link_text('Run').click()
time.sleep(2)
df_url = driver.switch_to_window(driver.window_handles[0])
page = requests.get(df_url).text
soup = BeautifulSoup(page, features = 'html5lib')
soup.prettify()
However, the following error pops up when I run it.
requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?
I will say that regardless of the parameters, the new tab always generates the same url. In other words, if the new tab creates www.website.com/b, it also creates www.website.com/b the third, fourth, etc. time, regardless of changing the parameters. Any thoughts?
回答1:
The problem lies here:
df_url = driver.switch_to_window(driver.window_handles[0])
page = requests.get(df_url).text
df_url
is not referring to the url of the page. To get that, you should call driver.current_url
after switching windows to get the url of the active window.
Some other pointers:
- finding elements by xpath is relatively inefficient (source)
- instead of
time.sleep
, you can look into using explicit waits
回答2:
Insert the url below the driver variable because first, the webdriver executes and then the url provided
driver=webdriver.Chrome(executable_path=r"C:\mypathto\chromedriver.exe")
url = "https://www.website.com"
来源:https://stackoverflow.com/questions/61237316/how-to-use-selenium-to-go-from-one-url-tab-to-another-before-scraping