Getting parts of a URL (Regex)

后端 未结 26 2204
说谎
说谎 2020-11-22 02:13

Given the URL (single line):
http://test.example.com/dir/subdir/file.html

How can I extract the following parts using regular expressions:

  1. The Subd
26条回答
  •  旧巷少年郎
    2020-11-22 02:54

    You can get all the http/https, host, port, path as well as query by using Uri object in .NET. just the difficult task is to break the host into sub domain, domain name and TLD.

    There is no standard to do so and can't be simply use string parsing or RegEx to produce the correct result. At first, I am using RegEx function but not all URL can be parse the subdomain correctly. The practice way is to use a list of TLDs. After a TLD for a URL is defined the left part is domain and the remaining is sub domain.

    However the list need to maintain it since new TLDs is possible. The current moment I know is publicsuffix.org maintain the latest list and you can use domainname-parser tools from google code to parse the public suffix list and get the sub domain, domain and TLD easily by using DomainName object: domainName.SubDomain, domainName.Domain and domainName.TLD.

    This answers also helpfull: Get the subdomain from a URL

    CaLLMeLaNN

提交回复
热议问题