I\'ve written a web scraper in python and I have a ton (thousands) of files that are extremely similar but not quite identical. The disk space used currently used by the files i