Regular expression for parsing links from a webpage?

前端 未结 9 675
南旧
南旧 2020-11-27 20:02

I\'m looking for a .NET regular expression extract all the URLs from a webpage but haven\'t found one to be comprehensive enough to cover all the different ways you can spec

相关标签:
9条回答
  • 2020-11-27 20:55

    All HTTP's and MAILTO's

    (["'])(mailto:|http:).*?\1
    

    All links, including relative ones, that are called by href or src.

    #Matches things in single or double quotes, but not the quotes themselves
    (?<=(["']))((?<=href=['"])|(?<=src=['"])).*?(?=\1)
    
    #Maches thing in either double or single quotes, including the quotes.
    (["'])((?<=href=")|(?<=src=")).*?\1
    

    The second one will only get you links that use double quotes, however.

    0 讨论(0)
  • 2020-11-27 21:01

    URL's? As in images/scripts/css/etc.?

    %href="(.["]*)"%

    0 讨论(0)
  • 2020-11-27 21:05

    from the RegexBuddy library:

    URL: Find in full text

    The final character class makes sure that if an URL is part of some text, punctuation such as a comma or full stop after the URL is not interpreted as part of the URL.

    \b(https?|ftp|file)://[-A-Z0-9+&@#/%?=~_|!:,.;]*[-A-Z0-9+&@#/%=~_|]

    0 讨论(0)
提交回复
热议问题