The following did not work.
wget -r -A .pdf home_page_url
It stop with the following message:
....
Removing site.com
It may be based on a robots.txt. Try adding -e robots=off
.
Other possible problems are cookie based authentication or agent rejection for wget. See these examples.
EDIT: The dot in ".pdf" is wrong according to sunsite.univie.ac.at
the following cmd works for me, it will download pictures of a site
wget -A pdf,jpg,png -m -p -E -k -K -np http://site/path/
This is certainly because of the links in the HTML don't end up with /.
Wget will not follow this has it think it's a file (but doesn't match your filter):
<a href="link">page</a>
But will follow this:
<a href="link/">page</a>
You can use the --debug
option to see if it's the actual problem.
I don't know any good solution for this. In my opinion this is a bug.