extract title tag from html

走远了吗. 提交于 2020-01-13 06:12:44

问题


I want to extract contents of title tag from html string. I have done some search but so far i am not able to find such code in VB/C# or PHP. Also this should work with both upper and lower case tags e.g. should work with both <title></title> and <TITLE></TITLE>. Thank you.


回答1:


You can use regular expressions for this but it's not completely error-proof. It'll do if you just want something simple though (in PHP):

function get_title($html) {
  return preg_match('!<title>(.*?)</title>!i', $html, $matches) ? $matches[1] : '';
}



回答2:


Sounds like a job for a regular expression. This will depend on the HTML being well-formed, i.e., only finds the title element inside a head element.

 Regex regex = new Regex( ".*<head>.*<title>(.*)</title>.*</head>.*",
                          RegexOptions.IgnoreCase );
 Match match = regex.Match( html );
 string title = match.Groups[0].Value;

I don't have my regex cheat sheet in front of me so it may need a little tweaking. Note that there is also no error checking in the case where no title element exists.




回答3:


If there is any attribute in the title tag (which is unlikely but can happen) you need to update the expression as follows:

$title = preg_match('!<title.*>(.*?)</title>!i', $url_content, $matches) ? $matches[1] : '';


来源:https://stackoverflow.com/questions/717100/extract-title-tag-from-html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!