Fastest way to retrieve a <title> in PHP

前端未结

关注

 7  1832

I\'m doing a bookmarking system and looking for the fastest (easiest) way to retrieve a page\'s title with PHP.

It would be nice to have something like $title


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  小鲜肉        
                
              
                            
                2020-11-28 12:20
              
            
            
                                                                       
You can get it without reg expressions:

$title = '';
$dom = new DOMDocument();

if($dom->loadHTMLFile($urlpage)) {
    $list = $dom->getElementsByTagName("title");
    if ($list->length > 0) {
        $title = $list->item(0)->textContent;
    }
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情话喂你        
                
              
                            
                2020-11-28 12:32
              
            
            
                                                                       
I'm also doing a bookmarking system and found that since PHP 5 you can use stream_get_line to load the remote page only until the closing title tag (instead of loading the whole file), then get rid of what's before the opening title tag with explode (instead of a regex).

function page_title($url) {
  $title = false;
  if ($handle = fopen($url, "r"))  {
    $string = stream_get_line($handle, 0, "</title>");
    fclose($handle);
    $string = (explode("<title", $string))[1];
    if (!empty($string)) {
      $title = trim((explode(">", $string))[1]);
    }
  }
  return $title;
}


Last explode thanks to PlugTrade's answer who reminded me that title tags can have attributes.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野的像风        
                
              
                            
                2020-11-28 12:35
              
            
            
                                                                       
A function to handle title tags that have attributes added to them

function get_title($html)
{
    preg_match("/<title(.+)<\/title>/siU", $html, $matches);
    if( !empty( $matches[1] ) ) 
    {
        $title = $matches[1];

        if( strstr($title, '>') )
        {
            $title = explode( '>', $title, 2 );
            $title = $title[1];

            return trim($title);
        }   
    }
}

$html = '<tiTle class="aunt">jemima</tiTLE>';
$title = get_title($html);
echo $title;

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  迷失自我        
                
              
                            
                2020-11-28 12:36
              
            
            
                                                                       
Regex? 

Use cURL to get the $htmlSource variable's contents. 

preg_match('/<title>(.*)<\/title>/iU', $htmlSource, $titleMatches);

print_r($titleMatches);


see what you have in that array.

Most people say for HTML traversing though you should use a parser as regexs can be unreliable.

The other answers provide more detail :)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  被撕碎了的回忆        
                
              
                            
                2020-11-28 12:36
              
            
            
                                                                       
I like using SimpleXml with regex's, this is from a solution I use to grab multiple link headers from a page in an OpenID library I've created. I've adapted it to work with the title (even though there is usually only one).

function getTitle($sFile)
{
    $sData = file_get_contents($sFile);

    if(preg_match('/<head.[^>]*>.*<\/head>/is', $sData, $aHead))
    {   
        $sDataHtml = preg_replace('/<(.[^>]*)>/i', strtolower('<$1>'), $aHead[0]);
        $xTitle = simplexml_import_dom(DomDocument::LoadHtml($sDataHtml));

        return (string)$xTitle->head->title;
    }
    return null;
}

echo getTitle('http://stackoverflow.com/questions/399332/fastest-way-to-retrieve-a-title-in-php');


Ironically this page has a "title tag" in the title tag which is what sometime causes problems with the pure regex solutions. 

This solution is not perfect as it lowercase's the tags which could cause a problem for the nested tag if formatting/case was important (such as XML), but there are ways that are a bit more involved around that problem.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情深已故        
                
              
                            
                2020-11-28 12:37
              
            
            
                                                                       
or making this simple function slightly more bullet proof:

function page_title($url) {

    $page = file_get_contents($url);

    if (!$page) return null;

    $matches = array();

    if (preg_match('/<title>(.*?)<\/title>/', $page, $matches)) {
        return $matches[1];
    } else {
        return null;
    }
}


echo page_title('http://google.com');

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复