Get specific subdomain from URL in foo.bar.car.com

后端 未结 7 1728
清酒与你
清酒与你 2020-11-30 10:19

Given a URL as follows:

foo.bar.car.com.au

I need to extract foo.bar.

I came across the following code :

pr         


        
相关标签:
7条回答
  • 2020-11-30 10:30

    In addition to the NuGet Nager.PubilcSuffix package specified in this answer, there is also the NuGet Louw.PublicSuffix package, which according to its GitHub project page is a .Net Core Library that parses Public Suffix, and is based on the Nager.PublicSuffix project, with the following changes:

    • Ported to .NET Core Library.
    • Fixed library so it passes ALL the comprehensive tests.
    • Refactored classes to split functionality into smaller focused classes.
    • Made classes immutable. Thus DomainParser can be used as singleton and is thread safe.
    • Added WebTldRuleProvider and FileTldRuleProvider.
    • Added functionality to know if Rule was a ICANN or Private domain rule.
    • Use async programming model

    The page also states that many of above changes were submitted back to original Nager.PublicSuffix project.

    0 讨论(0)
  • 2020-11-30 10:35

    I would recommend using Regular Expression. The following code snippet should extract what you are looking for...

    string input = "foo.bar.car.com.au";
    var match = Regex.Match(input, @"^\w*\.\w*\.\w*");
    var output = match.Value;
    
    0 讨论(0)
  • 2020-11-30 10:37

    OK, first. Are you specifically looking in 'com.au', or are these general Internet domain names? Because if it's the latter, there is simply no automatic way to determine how much of the domain is a "site" or "zone" or whatever and how much is an individual "host" or other record within that zone.

    If you need to be able to figure that out from an arbitrary domain name, you will want to grab the list of TLDs from the Mozilla Public Suffix project (http://publicsuffix.org) and use their algorithm to find the TLD in your domain name. Then you can assume that the portion you want ends with the last label immediately before the TLD.

    0 讨论(0)
  • 2020-11-30 10:42

    Given your requirement (you want the 1st two levels, not including 'www.') I'd approach it something like this:

    private static string GetSubDomain(Uri url)
    {
    
        if (url.HostNameType == UriHostNameType.Dns)
        {
    
            string host = url.Host;
    
            var nodes = host.Split('.');
            int startNode = 0;
            if(nodes[0] == "www") startNode = 1;
    
            return string.Format("{0}.{1}", nodes[startNode], nodes[startNode + 1]);
    
        }
    
        return null; 
    }
    
    0 讨论(0)
  • 2020-11-30 10:45

    I faced a similar problem and, based on the preceding answers, wrote this extension method. Most importantly, it takes a parameter that defines the "root" domain, i.e. whatever the consumer of the method considers to be the root. In the OP's case, the call would be

    Uri uri = "foo.bar.car.com.au";
    uri.DnsSafeHost.GetSubdomain("car.com.au"); // returns foo.bar
    uri.DnsSafeHost.GetSubdomain(); // returns foo.bar.car
    

    Here's the extension method:

    /// <summary>Gets the subdomain portion of a url, given a known "root" domain</summary>
    public static string GetSubdomain(this string url, string domain = null)
    {
      var subdomain = url;
      if(subdomain != null)
      {
        if(domain == null)
        {
          // Since we were not provided with a known domain, assume that second-to-last period divides the subdomain from the domain.
          var nodes = url.Split('.');
          var lastNodeIndex = nodes.Length - 1;
          if(lastNodeIndex > 0)
            domain = nodes[lastNodeIndex-1] + "." + nodes[lastNodeIndex];
        }
    
        // Verify that what we think is the domain is truly the ending of the hostname... otherwise we're hooped.
        if (!subdomain.EndsWith(domain))
          throw new ArgumentException("Site was not loaded from the expected domain");
    
        // Quash the domain portion, which should leave us with the subdomain and a trailing dot IF there is a subdomain.
        subdomain = subdomain.Replace(domain, "");
        // Check if we have anything left.  If we don't, there was no subdomain, the request was directly to the root domain:
        if (string.IsNullOrWhiteSpace(subdomain))
          return null;
    
        // Quash any trailing periods
        subdomain = subdomain.TrimEnd(new[] {'.'});
      }
    
      return subdomain;
    }
    
    0 讨论(0)
  • 2020-11-30 10:49

    You can use the following nuget package Nager.PublicSuffix. It uses the PUBLIC SUFFIX LIST from Mozilla to split the domain.

    PM> Install-Package Nager.PublicSuffix
    

    Example

     var domainParser = new DomainParser();
     var data = await domainParser.LoadDataAsync();
     var tldRules = domainParser.ParseRules(data);
     domainParser.AddRules(tldRules);
    
     var domainName = domainParser.Get("sub.test.co.uk");
     //domainName.Domain = "test";
     //domainName.Hostname = "sub.test.co.uk";
     //domainName.RegistrableDomain = "test.co.uk";
     //domainName.SubDomain = "sub";
     //domainName.TLD = "co.uk";
    
    0 讨论(0)
提交回复
热议问题