问题
I'm looking for preg_match_all pattern to find all URL on a page that don't have trailing slash.
For example: if I have
a href="/testing/abc/">end with slash
a href="/testing/test/mnl">no ending slash
The result would be #2
Thanks.
回答1:
Better extract all your href links using DOM parser and see if URL is ending with slash or not. No regex needed for that.
For the regex solution for the examples provided you can use this regex:
/href=(['"])[^\s]+(?<!\/)\1/
Live Demo: http://www.rubular.com/r/f2XJ6rF5Fb
Explanation:
href= -> match text href=
(['"]) -> match single or double quote and create a group #1 with this match
[^\s]+ -> match 1 or more character until a space is found
(?<!\/) -> (negative lookbehind) only match if is not preceded by /
\1 -> match closing single or double quote (group #1)
回答2:
Indeed, use a DOM parser [why?]. Here's an example:
// let's define some HTML
$html = <<<'HTML'
<html>
<head>
</head>
<body>
<a href="/testing/abc/">end with slash</a>
<a href="/testing/test/mnl">no ending slash</a>
</body>
</html>
HTML;
// create a DOMDocument instance (a DOM parser)
$dom = new DOMDocument();
// load the HTML
$dom->loadHTML( $html );
// create a DOMXPath instance, to query the DOM
$xpath = new DOMXPath( $dom );
// find all nodes containing an href attribute, and return the attribute node
$linkNodes = $xpath->query( '//*[@href]/@href' );
// initialize a result array
$result = array();
// iterate all found attribute nodes
foreach( $linkNodes as $linkNode )
{
// does its value not end with a forward slash?
if( substr( $linkNode->value, -1 ) !== '/' )
{
// add the attribute value to the result array
$result[] = $linkNode->value;
}
}
// let's look at the result
var_dump( $result );
来源:https://stackoverflow.com/questions/15414909/find-pattern-for-url-with-no-ending-slash