Download all pdf files using wget

后端 未结 1 632
一向
一向 2021-02-04 22:35

I have the following site http://www.asd.com.tr. I want to download all PDF files into one directory. I\'ve tried a couple of commands but am not having much luck.



        
1条回答
  •  轻奢々
    轻奢々 (楼主)
    2021-02-04 23:12

    First, verify that the TOS of the web site permit to crawl it. Then, one solution is :

    mech-dump --links 'http://domain.com' |
        grep pdf$ |
        sed 's/\s+/%20/g' |
        xargs -I% wget http://domain.com/%
    

    The mech-dump command comes with Perl's module WWW::Mechanize (libwww-mechanize-perl package on debian & debian likes distros)

    0 讨论(0)
提交回复
热议问题