How to find broken links in Selenium + Python

前端 未结 3 1240
再見小時候
再見小時候 2021-01-03 06:11

I\'m trying to find a broken link in Selenium and Python but getting an error in the code:

import requests
from selenium import webdriver

chrome_driver_path         


        
相关标签:
3条回答
  • 2021-01-03 06:37

    Because of You are missing closing bracket on the below line or is it typo?

    r = requests.head(link.get_attribute('href'))
    
    0 讨论(0)
  • 2021-01-03 06:45
    from selenium import webdriver
    chrome_driver_path = "D:\\drivers\\chromedriver.exe"
    driver=webdriver.Chrome(chrome_driver_path)
    import requests
    for link in links:
        r = requests.head(link)
        if r.status_code!=404:
             driver.get(link)
        else:
              print(str(link) + " isn't available.")
    
    0 讨论(0)
  • 2021-01-03 06:46

    To find the status of the links on the page you can use the following solution:

    • Code Block:

      import requests
      from selenium import webdriver
      
      options = webdriver.ChromeOptions() 
      options.add_argument("start-maximized")
      options.add_argument('disable-infobars')
      driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
      driver.get('https://google.co.in/')
      links = driver.find_elements_by_css_selector("a")
      for link in links:
          r = requests.head(link.get_attribute('href'))
          print(link.get_attribute('href'), r.status_code)
      
    • Console Output:

      https://mail.google.com/mail/?tab=wm 302
      https://www.google.co.in/imghp?hl=en&tab=wi 200
      https://www.google.co.in/intl/en/options/ 301
      https://myaccount.google.com/?utm_source=OGB&utm_medium=app 302
      https://www.google.co.in/webhp?tab=ww 200
      https://maps.google.co.in/maps?hl=en&tab=wl 302
      https://www.youtube.com/?gl=IN 200
      https://play.google.com/?hl=en&tab=w8 302
      https://news.google.co.in/nwshp?hl=en&tab=wn 301
      https://mail.google.com/mail/?tab=wm 302
      https://www.google.com/contacts/?hl=en&tab=wC 302
      https://drive.google.com/?tab=wo 302
      https://www.google.com/calendar?tab=wc 302
      https://plus.google.com/?gpsrc=ogpy0&tab=wX 302
      https://translate.google.co.in/?hl=en&tab=wT 200
      https://photos.google.com/?tab=wq&pageId=none 302
      https://www.google.co.in/intl/en/options/ 301
      https://docs.google.com/document/?usp=docs_alc 302
      https://books.google.co.in/bkshp?hl=en&tab=wp 200
      https://www.blogger.com/?tab=wj 405
      https://hangouts.google.com/ 302
      https://keep.google.com/ 302
      https://earth.google.com/web/ 200
      https://www.google.co.in/intl/en/options/ 301
      https://accounts.google.com/ServiceLogin?hl=en&passive=true&continue=https://www.google.co.in/ 200
      https://www.google.co.in/webhp?hl=en&sa=X&ved=0ahUKEwj0qNPqnqHbAhXYdn0KHXpeAo0QPAgD 200
      
    0 讨论(0)
提交回复
热议问题