Download all files of a particular type from a website using wget stops in the starting url

前端 未结 3 1015
心在旅途
心在旅途 2021-02-09 02:41

The following did not work.

wget -r -A .pdf home_page_url

It stop with the following message:

....
Removing site.com         


        
相关标签:
3条回答
  • 2021-02-09 02:58

    It may be based on a robots.txt. Try adding -e robots=off.

    Other possible problems are cookie based authentication or agent rejection for wget. See these examples.

    EDIT: The dot in ".pdf" is wrong according to sunsite.univie.ac.at

    0 讨论(0)
  • 2021-02-09 03:07

    the following cmd works for me, it will download pictures of a site

    wget -A pdf,jpg,png -m -p -E -k -K -np http://site/path/
    
    0 讨论(0)
  • 2021-02-09 03:16

    This is certainly because of the links in the HTML don't end up with /.

    Wget will not follow this has it think it's a file (but doesn't match your filter):

    <a href="link">page</a>
    

    But will follow this:

    <a href="link/">page</a>
    

    You can use the --debug option to see if it's the actual problem.

    I don't know any good solution for this. In my opinion this is a bug.

    0 讨论(0)
提交回复
热议问题