I have a pretty generic spider that I do broad crawls with. I feed it a couple hundred starting urls, limit the allowed_domains and let it go wild (I\'m following the suggested