Regular expression for parsing links from a webpage?

前端未结

关注

 9  685

I\'m looking for a .NET regular expression extract all the URLs from a webpage but haven\'t found one to be comprehensive enough to cover all the different ways you can spec

相关标签:

9条回答

挽巷

2020-11-27 20:55

All HTTP's and MAILTO's

(["'])(mailto:|http:).*?\1

All links, including relative ones, that are called by href or src.

#Matches things in single or double quotes, but not the quotes themselves
(?<=(["']))((?<=href=['"])|(?<=src=['"])).*?(?=\1)

#Maches thing in either double or single quotes, including the quotes.
(["'])((?<=href=")|(?<=src=")).*?\1

The second one will only get you links that use double quotes, however.

0 讨论(0)

别那么骄傲

2020-11-27 21:01

URL's? As in images/scripts/css/etc.?

%href="(.["]*)"%

0 讨论(0)
发布评论:

提交评论
- 加载中...
抹茶落季

2020-11-27 21:05

from the RegexBuddy library:

URL: Find in full text

The final character class makes sure that if an URL is part of some text, punctuation such as a comma or full stop after the URL is not interpreted as part of the URL.

\b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2

Regular expression for parsing links from a webpage?

URL: Find in full text