Weird error using PHP Simple HTML DOM parser

前端未结

关注

 9  1355

I am using this library (PHP Simple HTML DOM parser) to parse a link, here\'s the code:

function getSemanticRelevantKeywords($keyword){
    $results = array(


                      
              相关标签:


      
      
        
          9条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  暗喜        
                
              
                            
                2020-11-29 10:55
              
            
            
                                                                       
Error means, the find() function is either not defined yet or not available. Make sure you have loaded or include related function.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  礼貌的吻别        
                
              
                            
                2020-11-29 10:56
              
            
            
                                                                       
For those arriving here via a search engine (as I did), after reading the info (and linked bug-report) above, I started some code-prodding and ended up fixing my problems with 2 extra checks after loading the dom;

$html = file_get_html('<your url here>');
// first check if $html->find exists
if (method_exists($html,"find")) {
     // then check if the html element exists to avoid trying to parse non-html
     if ($html->find('html')) {
          // and only then start searching (and manipulating) the dom 
     }
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  Happy的楠姐        
                
              
                            
                2020-11-29 10:59
              
            
            
                                                                       
You just need to increase CONSTANT MAX_FILE_SIZE in file simple_html_dom.php. 

For example: 

define('MAX_FILE_SIZE', 999999999999999);

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  遥遥无期        
                
              
                            
                2020-11-29 11:04
              
            
            
                                                                       
The reason for this error is: the simple HTML DOM does not return the object if the size of the response from url is greater than 600000.

You can void it by changing the simple_html_dom.php file. Remove strlen($contents) > MAX_FILE_SIZE from the if condition of the file_get_html function.

This will solve your issue.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  暗喜        
                
              
                            
                2020-11-29 11:05
              
            
            
                                                                       
I'm having the same error come up in my logs and apart from the solutions mentioned above, it could also be that there is no 'span' in the document. I get the same error when searching for divs with a particular class that doesn't exist on the page, but when searching for something that I know exists on the page, the error doesn't pop up.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  离开以前        
                
              
                            
                2020-11-29 11:07
              
            
            
                                                                       
Before file_get_html/load_file method, you should first check if URL exists or not.

If the URL exists, you pass one step.

(Some servers, service a 404 page a valid HTML page. which has propriate HTML page structure like body, head, etc. But it has only text "This page couldn'!t find. 404 error bla bla..)

If URL is 200-OK, then you should check whether fetched thing is object and whether nodes are set. 

That's the code i used in my pages.

function url_exists($url){
    if ((strpos($url, "http")) === false) $url = "http://" . $url;
    $headers = @get_headers($url);
    // print_r($headers);
    if (is_array($headers)){
        if(strpos($headers[0], '404 Not Found'))
            return false;
        else
            return true;    
    }         
    else
        return false;
}

$pageAddress='http://www.google.com';
if ( url_exists($pageAddress) ) {
    $htmlPage->load_file( $pageAddress );
} else {
    echo 'url doesn t exist, i stop';
    return;
}

if( $htmlPage && is_object($htmlPage) && isset($htmlPage->nodes) )
{
    // do your work here...
} else {
    echo 'fetched page is not ok, i stop';
    return;
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复