Regular Expression to match both relative and absolute URLs

后端 未结 6 1248
自闭症患者
自闭症患者 2021-02-09 03:12

Anyone want to try their hand at coming up with a regex that matches both:

  • /foo/bar/baz.gif
  • /foo/bar/
  • http://www.foo.com/foo/bar

I

6条回答
  •  我寻月下人不归
    2021-02-09 03:39

    (
      ((http|https|ftp)://([\w-\d]+\.)+[\w-\d]+){0,1}  // Capture domain names or IP addresses
      (/[\w~,;\-\./?%&+#=]*)                // Capture paths, including relative
    )
    

    Rationale for this answer:

    1. The whole thing is grouped so you can pick out the entire URL
    2. The protocol portion is optional, but if provided, a hostname or IP address should also be provided (both of which have fewer allowed characters than the rest of the URI).
    3. The "/" at the beginning is also optional. Paths can be in the form "images/1.gif", which are relative to the current path rather than relative to the hostname.

    Caveats:

    1. mailto and file URIs not supported.
    2. URLs trailed by a period (such as at the end of a sentence without quotation) will include the trailing period.
    3. Because of #3 above, it's going to capture all sorts of things. If you can verify that all paths are not relative, you can add a "/" outside the parenthesis and thus require it.
    4. If all URIs are within HTML attributes (A, LINK, IMG, etc.), you can target the URIs much more accurately by only capturing within quotes, or at least only within HTML tags.

    Edit: whoops, fixed closing paren problem.

提交回复
热议问题