Hi I\'m working on a project for fun with the common crawl data I have a subset of the most current crawls warc file paths from here
so basically I have a url like ht