Regex to replace relative link with root relative link

喜欢而已 提交于 2019-12-06 12:40:08

问题


I have a string of text that contains html with all different types of links (relative, absolute, root-relative). I need a regex that can be executed by PHP's preg_replace to replace all relative links with root-relative links, without touching any of the other links. I have the root path already.

Replaced links:

<tag ... href="path/to_file.ext" ... >   --->   <tag ... href="/basepath/path/to_file.ext" ... >
<tag ... href="path/to_file.ext" ... />   --->   <tag ... href="/basepath/path/to_file.ext" ... />

Untouched links:

<tag ... href="/any/path" ... >
<tag ... href="/any/path" ... />
<tag ... href="protocol://domain.com/any/path" ... >
<tag ... href="protocol://domain.com/any/path" ... />

回答1:


If you just want to change the base URI, you can try the BASE element:

<base href="/basepath/">

But note that changing the base URI affects all relative URIs and not just relative URI paths.

Otherwise, if you really want to use regular expression, consider that a relative path like you want must be of the type path-noscheme (see RFC 3986):

path-noscheme = segment-nz-nc *( "/" segment )
segment       = *pchar
segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
                ; non-zero-length segment without any colon ":"
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
pct-encoded   = "%" HEXDIG HEXDIG
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

So the begin of the URI must match:

^([a-zA-Z0-9-._~!$&'()*+,;=@]|%[0-9a-fA-F]{2})+($|/)

But please use a proper HTML parser for parsing the HTML an build a DOM out of that. Then you can query the DOM to get the href attributes and test the value with the regular expression above.




回答2:


I came up with this:

preg_replace('#href=["\']([^/][^\':"]*)["\']#', $root_path.'$1', $html);

It might be a little too simplistic. The obvious flaw I see is that it will also match href="something" when it is outside of a tag, but hopefully it can get you started.



来源:https://stackoverflow.com/questions/2869844/regex-to-replace-relative-link-with-root-relative-link

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!