I am working on building a link checking script to be used in monitoring a domain I manage. I am getting an error about the 9th url is ran through the findLinks() function. I
You cannot iterate over (use the in
keyword to check the contents of) None
, which is the default returned from get()
when it fails to find the provided name, so using an empty list as the default (second argument) will prevent the error:
for link in soup.find_all('a'):
# all absolute paths hrefs and add to array
if "google.com" in link.get('href', []):
linksToCrawl.append(link.get('href'))
You still may wish to confirm that you need link.get('href')
to return something truthy before getting this far into the function.