Scrapy: Google Crawl doesn't work

前端未结

关注

 2  1041

When I try to crawl Google for search results, Scrapy just yields the Google home page: http://pastebin.com/FUbvbhN4

Here is my spider:

import scrapy


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  佛祖请我去吃肉        
                
              
                            
                2021-01-17 00:51
              
            
            
                                                                       
for the most cases, google would redirect the spider to the CAPTCHA page, bing search result is easier to crawl. 

there is a project for crawling search result from Google/Bing/Baidu https://github.com/titantse/seCrawler
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  独厮守ぢ        
                
              
                            
                2021-01-17 00:58
              
            
            
                                                                       
Yes, looks like that address is redirecting to the home page: 

example with scrapy shell http://www.google.com/#q=finance.google.com:+3m+co:

...
[s]   request    <GET http://www.google.com/#q=finance.google.com:+3m+co>
[s]   response   <200 http://www.google.com/>
...


Checking your url it makes sense, it isn't containing parameters, but #q (which isn't a url parameter) and the browser is the one recognizing that and making it a google search, so it is not exactly a url path.

the correct google search url is: http://www.google.com/search?q=YOURQUERY
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复