I wrote a spider program with python. It can recursively crawl web pages. I want to avoid download the same pages, so I store the urls in a list as follow:
urls
you can make urls into a set:
urls = set()
def download(mainPage): # mainPage is a link
global urls
links = getHrefLinks(mainPage)
for l in links:
if l not in urls:
urls.add(l) #instead of append
downPage(l)
Lookups of objects, i.e., x in s
are, in the average case, of complexity O(1), which is better than the average case of the list
.