Getting parts of a URL (Regex)

后端 未结 26 2224
说谎
说谎 2020-11-22 02:13

Given the URL (single line):
http://test.example.com/dir/subdir/file.html

How can I extract the following parts using regular expressions:

  1. The Subd
相关标签:
26条回答
  • 2020-11-22 02:50

    Here is one that is complete, and doesnt rely on any protocol.

    function getServerURL(url) {
            var m = url.match("(^(?:(?:.*?)?//)?[^/?#;]*)");
            console.log(m[1]) // Remove this
            return m[1];
        }
    
    getServerURL("http://dev.test.se")
    getServerURL("http://dev.test.se/")
    getServerURL("//ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js")
    getServerURL("//")
    getServerURL("www.dev.test.se/sdas/dsads")
    getServerURL("www.dev.test.se/")
    getServerURL("www.dev.test.se?abc=32")
    getServerURL("www.dev.test.se#abc")
    getServerURL("//dev.test.se?sads")
    getServerURL("http://www.dev.test.se#321")
    getServerURL("http://localhost:8080/sads")
    getServerURL("https://localhost:8080?sdsa")
    

    Prints

    http://dev.test.se
    
    http://dev.test.se
    
    //ajax.googleapis.com
    
    //
    
    www.dev.test.se
    
    www.dev.test.se
    
    www.dev.test.se
    
    www.dev.test.se
    
    //dev.test.se
    
    http://www.dev.test.se
    
    http://localhost:8080
    
    https://localhost:8080
    
    0 讨论(0)
  • 2020-11-22 02:51

    I needed a regular Expression to match all urls and made this one:

    /(?:([^\:]*)\:\/\/)?(?:([^\:\@]*)(?:\:([^\@]*))?\@)?(?:([^\/\:]*)\.(?=[^\.\/\:]*\.[^\.\/\:]*))?([^\.\/\:]*)(?:\.([^\/\.\:]*))?(?:\:([0-9]*))?(\/[^\?#]*(?=.*?\/)\/)?([^\?#]*)?(?:\?([^#]*))?(?:#(.*))?/
    

    It matches all urls, any protocol, even urls like

    ftp://user:pass@www.cs.server.com:8080/dir1/dir2/file.php?param1=value1#hashtag
    

    The result (in JavaScript) looks like this:

    ["ftp", "user", "pass", "www.cs", "server", "com", "8080", "/dir1/dir2/", "file.php", "param1=value1", "hashtag"]
    

    An url like

    mailto://admin@www.cs.server.com
    

    looks like this:

    ["mailto", "admin", undefined, "www.cs", "server", "com", undefined, undefined, undefined, undefined, undefined] 
    
    0 讨论(0)
  • 2020-11-22 02:51

    regexp to get the URL path without the file.

    url = 'http://domain/dir1/dir2/somefile' url.scan(/^(http://[^/]+)((?:/[^/]+)+(?=/))?/?(?:[^/]+)?$/i).to_s

    It can be useful for adding a relative path to this url.

    0 讨论(0)
  • 2020-11-22 02:54

    You can get all the http/https, host, port, path as well as query by using Uri object in .NET. just the difficult task is to break the host into sub domain, domain name and TLD.

    There is no standard to do so and can't be simply use string parsing or RegEx to produce the correct result. At first, I am using RegEx function but not all URL can be parse the subdomain correctly. The practice way is to use a list of TLDs. After a TLD for a URL is defined the left part is domain and the remaining is sub domain.

    However the list need to maintain it since new TLDs is possible. The current moment I know is publicsuffix.org maintain the latest list and you can use domainname-parser tools from google code to parse the public suffix list and get the sub domain, domain and TLD easily by using DomainName object: domainName.SubDomain, domainName.Domain and domainName.TLD.

    This answers also helpfull: Get the subdomain from a URL

    CaLLMeLaNN

    0 讨论(0)
  • 2020-11-22 02:54

    I tried this regex for parsing url partitions:

    ^((http[s]?|ftp):\/)?\/?([^:\/\s]+)(:([^\/]*))?((\/?(?:[^\/\?#]+\/+)*)([^\?#]*))(\?([^#]*))?(#(.*))?$
    

    URL: https://www.google.com/my/path/sample/asd-dsa/this?key1=value1&key2=value2

    Matches:

    Group 1.    0-7 https:/
    Group 2.    0-5 https
    Group 3.    8-22    www.google.com
    Group 6.    22-50   /my/path/sample/asd-dsa/this
    Group 7.    22-46   /my/path/sample/asd-dsa/
    Group 8.    46-50   this
    Group 9.    50-74   ?key1=value1&key2=value2
    Group 10.   51-74   key1=value1&key2=value2
    
    0 讨论(0)
  • 2020-11-22 02:59

    None of the above worked for me. Here's what I ended up using:

    /^(?:((?:https?|s?ftp):)\/\/)([^:\/\s]+)(?::(\d*))?(?:\/([^\s?#]+)?([?][^?#]*)?(#.*)?)?/
    
    0 讨论(0)
提交回复
热议问题