Youtube complete Java Regex

后端未结

关注

 2  1364

无人共我

I need to parse several pages to get all of their Youtube IDs.

I found many regular expressions on the web, but : the Java ones are not complete (they either give me gar

相关标签:

2条回答

轮回少年

2021-02-04 12:09
Marcus above has a good regex, but i found that it doesn't recognize youtube links that have "www" but not "http(s)" in them for example www.youtube....

i have an update:
```
^(?:https?:\\/\\/)?(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*
```
it's the same except for the start
0 讨论(0)
发布评论:

提交评论
- 加载中...

庸人自扰

2021-02-04 12:10

First of all you need to insert and extra backslash \ foreach backslash in the old regex, else java thinks you escapes some other special characters in the string, which you are not doing.

https?:\\/\\/(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*

Next when you compile your pattern you need to add the CASE_INSENSITIVE flag. Here's an example:

String pattern = "https?:\\/\\/(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*";

Pattern compiledPattern = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = compiledPattern.matcher(link);
while(matcher.find()) {
    System.out.println(matcher.group());
}

0 讨论(0)