Strange CURL issue with a particular website SSL certificate

前端未结

关注

 1  894

I am trying to use CURL to get web pages from a paricualr website however it gives this error:

curl -q -v -A \"Mozilla/5.0 (compatible; Googlebot/2.1; +http://ww


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  温柔的废话        
                
              
                            
                2021-01-26 05:08
              
            
            
                                                                       
The problem is not the certificate of this site. From the debug output it can be clearly seen that the TLS handshake is done successfully and outside this handshake the certificate does not matter.

But, it can be seen that the site www.saiglobal.com is CDN protected by Akamai CDN and Akamai features some kind of bot detection:

$ dig www.saiglobal.com
...
www.saiglobal.com.      45      IN      CNAME   www.saiglobal.com.edgekey.net.
www.saiglobal.com.edgekey.net. 62 IN    CNAME   e9158.a.akamaiedge.net.


This bot detection is known to use some heuristics in order to distinguish bots from normal browsers and detection of a bot might result in a status code 403 access denied or in a simple hang of the site - see Scraping attempts getting 403 error or Requests SSL connection timeout.

In this specific case it seems to currently help if some specific HTTP headers are added, specifically Accept-Encoding, Accept-Language, Connection with a value of keep-alive and User-Agent which matches somehow Mozilla. Failure to add these headers or having the wrong values will result in a hang. 

The following works currently for me:

$ curl -q -v \
   -H "Connection: keep-alive" \
   -H "Accept-Encoding: identity" \
   -H "Accept-Language: en-US" \
   -H "User-Agent: Mozilla/5.0"  \
   https://www.saiglobal.com/


Note that this deliberately tries to bypass the bot detection. It might stop working if Akamai makes changes to the bot detection. 

Please note also that the owner of the site has explicitly enable bot detection for a reason. This means that with deliberately bypassing the detection for your own gain (like providing some service based on scraped information) you might get into legal problems.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复