I\'d like to crawl a web site to build its sitemap.
Problems is, the site uses an htaccess file to block spiders, so the following command only downloads the homepag
you might want to set the User-Agent to something more than just Mozilla, something like:
wget --user-agent="Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0"