Download all files of a particular type from a website using wget stops in the starting url

前端未结

关注

 3  1019

The following did not work.

wget -r -A .pdf home_page_url

It stop with the following message:

....
Removing site.com


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  独厮守ぢ        
                
              
                            
                2021-02-09 02:58
              
            
            
                                                                       
It may be based on a robots.txt. Try adding -e robots=off.

Other possible problems are cookie based authentication or agent rejection for wget.
See these examples.

EDIT: The dot in ".pdf" is wrong according to sunsite.univie.ac.at
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  伪装坚强ぢ        
                
              
                            
                2021-02-09 03:07
              
            
            
                                                                       
the following cmd works for me, it will download pictures of a site

wget -A pdf,jpg,png -m -p -E -k -K -np http://site/path/

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  栀梦        
                
              
                            
                2021-02-09 03:16
              
            
            
                                                                       
This is certainly because of the links in the HTML don't end up with /.

Wget will not follow this has it think it's a file (but doesn't match your filter):

<a href="link">page</a>


But will follow this:

<a href="link/">page</a>


You can use the --debug option to see if it's the actual problem.

I don't know any good solution for this. In my opinion this is a bug.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复