Youtube complete Java Regex

后端 未结 2 1364
无人共我
无人共我 2021-02-04 11:32

I need to parse several pages to get all of their Youtube IDs.

I found many regular expressions on the web, but : the Java ones are not complete (they either give me gar

相关标签:
2条回答
  • 2021-02-04 12:09

    Marcus above has a good regex, but i found that it doesn't recognize youtube links that have "www" but not "http(s)" in them for example www.youtube....

    i have an update:

    ^(?:https?:\\/\\/)?(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*
    

    it's the same except for the start

    0 讨论(0)
  • 2021-02-04 12:10

    First of all you need to insert and extra backslash \ foreach backslash in the old regex, else java thinks you escapes some other special characters in the string, which you are not doing.

    https?:\\/\\/(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*
    

    Next when you compile your pattern you need to add the CASE_INSENSITIVE flag. Here's an example:

    String pattern = "https?:\\/\\/(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*";
    
    Pattern compiledPattern = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
    Matcher matcher = compiledPattern.matcher(link);
    while(matcher.find()) {
        System.out.println(matcher.group());
    }
    
    0 讨论(0)
提交回复
热议问题