Parsing youtube url

前端 未结 3 1793
野的像风
野的像风 2021-02-03 13:06

I\'ve written a ruby youtube url parser. It\'s designed to take an input of a youtube url of one of the following structures (these are currently the youtube url structures tha

相关标签:
3条回答
  • 2021-02-03 13:32
    require 'uri'
    require 'cgi'
    
    urls = %w[http://youtu.be/sGE4HMvDe-Q 
              http://www.youtube.com/watch?v=Lp7E973zozc&feature=relmfu
              http://www.youtube.com/p/A0C3C1D163BE880A?hl=en_US&fs=1]
    
    def parse_youtube url
      u = URI.parse url
      if u.path =~ /watch/
        p CGI::parse(u.query)["v"].first
      else
        p u.path
      end
    end
    
    urls.each { |url| parse_youtube url }
    #=> "/sGE4HMvDe-Q"
    #=> "Lp7E973zozc"
    #=> "/p/A0C3C1D163BE880A"
    
    0 讨论(0)
  • 2021-02-03 13:39
    def parse_youtube url
       regex = /(?:.be\/|\/watch\?v=|\/(?=p\/))([\w\/\-]+)/
       url.match(regex)[1]
    end
    
    urls = %w[http://youtu.be/sGE4HMvDe-Q 
              http://www.youtube.com/watch?v=Lp7E973zozc&feature=relmfu
              http://www.youtube.com/p/A0C3C1D163BE880A?hl=en_US&fs=1]
    
    urls.each {|url| puts parse_youtube url }
    # sGE4HMvDe-Q
    # Lp7E973zozc
    # p/A0C3C1D163BE880A
    

    Depending on how you use this, you might want a better validation that the URL is indeed from youtube.

    UPDATE:

    Coming back to this a few years later. I've always been annoyed by how sloppy the original answer was. Since the validity of the Youtube domain wasn't validated anyway, I've removed some of the slop.

    NODE                     EXPLANATION
    --------------------------------------------------------------------------------
      (?:                      group, but do not capture:
    --------------------------------------------------------------------------------
        .                        any character except \n
    --------------------------------------------------------------------------------
        be                       'be'
    --------------------------------------------------------------------------------
        \/                       '/'
    --------------------------------------------------------------------------------
       |                        OR
    --------------------------------------------------------------------------------
        \/                       '/'
    --------------------------------------------------------------------------------
        watch                    'watch'
    --------------------------------------------------------------------------------
        \?                       '?'
    --------------------------------------------------------------------------------
        v=                       'v='
    --------------------------------------------------------------------------------
       |                        OR
    --------------------------------------------------------------------------------
        \/                       '/'
    --------------------------------------------------------------------------------
        (?=                      look ahead to see if there is:
    --------------------------------------------------------------------------------
          p                        'p'
    --------------------------------------------------------------------------------
          \/                       '/'
    --------------------------------------------------------------------------------
        )                        end of look-ahead
    --------------------------------------------------------------------------------
      )                        end of grouping
    --------------------------------------------------------------------------------
      (                        group and capture to \1:
    --------------------------------------------------------------------------------
        [\w\/\-]+                any character of: word characters (a-z,
                                 A-Z, 0-9, _), '\/', '\-' (1 or more
                                 times (matching the most amount
                                 possible))
    --------------------------------------------------------------------------------
      )                        end of \1
    
    0 讨论(0)
  • 2021-02-03 13:44

    Using the Addressable gem, you can save yourself some work. There's also a URI module in stdlib, but Addressable is more powerful.

    require 'addressable/uri'
    
    uri = Addressable::URI.parse(youtube_url)
    if uri.path == "/watch"
      uri.query_values["v"] if uri.query_values
    else
      uri.path
    end
    

    EDIT | Removed madness. Didn't notice Addressable provides #query_values already.

    0 讨论(0)
提交回复
热议问题