I want to be able to get the list of all URLs that a browser will do a GET request for when we try to open a page. For eg: if we try to open cnn.com, there are multiple URLs wit
I guess you will have to create a list of all known file extensions that you do NOT want, and then scan the content of the http response, checking with "if substring not in nono-list:"
The problem is all href's ending with TLDs, forwardslashes, url-delivered variables and so on, so i think it would be easier to check for stuff you know you dont want.