Properly Matching a IDN URL

邮差的信 提交于 2019-11-27 15:52:23

John Gruber, of Daring Fireball fame, had a post recently that detailed his quest for a good URL-recognizing regex string. What he came up with was this:

\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))

Which apparently does OK with Unicode-containing URLs, as well. You'd need to do the slight modification to it to get the rest of what you're looking for -- the scheme, username, password, etc. Alan Storm wrote a piece explaining Gruber's regex pattern, which I definitely needed (regex is so write-once-have-no-clue-how-to-read-ever-again!).

If you require the protocol and aren't worried too much about false positives, by far the easiest thing to do is match all non-whitespace characters around ://

This will get you most of the way there. If you need it more refined please provide test data.

(ftp|https?)://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!