问题
As the title suggests, I want to run code like this (top_url_list is just a list of urls I'm looping through to find instances of these filename conventions that I'm looking for with regex:
name_files = []
for i in top_url_list:
result = re.findall("\/([a-z]+[0-9][0-9]\W[a-z]+)", str(urlopen(i).read()))
Where the objective is to grab all of the instances where the regex checks out, hence the 'findall()" function. The problem is, it's important that I only get distinct/uniques of each instance. Is this possible?
回答1:
re.findall() gives non-overlapping matches of pattern in string, as a list of strings. You can convert it into unique values using set(). Sample example regarding how set()
works:
>>> my_list = [1, 5, 2, 5, 2, 7]
>>> set(my_list)
set([1, 2, 5, 7]) # Duplicate entries of 5 and 2 are removed
来源:https://stackoverflow.com/questions/40165530/re-findall-where-i-want-all-unique-instances-of-the-regex-on-the-page