Given a URL as follows:
foo.bar.car.com.au
I need to extract foo.bar
.
I came across the following code :
pr
In addition to the NuGet Nager.PubilcSuffix package specified in this answer, there is also the NuGet Louw.PublicSuffix package, which according to its GitHub project page is a .Net Core Library that parses Public Suffix, and is based on the Nager.PublicSuffix project, with the following changes:
DomainParser
can be used as singleton and is thread safe.WebTldRuleProvider
and FileTldRuleProvider
.The page also states that many of above changes were submitted back to original Nager.PublicSuffix project.
I would recommend using Regular Expression. The following code snippet should extract what you are looking for...
string input = "foo.bar.car.com.au";
var match = Regex.Match(input, @"^\w*\.\w*\.\w*");
var output = match.Value;
OK, first. Are you specifically looking in 'com.au', or are these general Internet domain names? Because if it's the latter, there is simply no automatic way to determine how much of the domain is a "site" or "zone" or whatever and how much is an individual "host" or other record within that zone.
If you need to be able to figure that out from an arbitrary domain name, you will want to grab the list of TLDs from the Mozilla Public Suffix project (http://publicsuffix.org) and use their algorithm to find the TLD in your domain name. Then you can assume that the portion you want ends with the last label immediately before the TLD.
Given your requirement (you want the 1st two levels, not including 'www.') I'd approach it something like this:
private static string GetSubDomain(Uri url)
{
if (url.HostNameType == UriHostNameType.Dns)
{
string host = url.Host;
var nodes = host.Split('.');
int startNode = 0;
if(nodes[0] == "www") startNode = 1;
return string.Format("{0}.{1}", nodes[startNode], nodes[startNode + 1]);
}
return null;
}
I faced a similar problem and, based on the preceding answers, wrote this extension method. Most importantly, it takes a parameter that defines the "root" domain, i.e. whatever the consumer of the method considers to be the root. In the OP's case, the call would be
Uri uri = "foo.bar.car.com.au";
uri.DnsSafeHost.GetSubdomain("car.com.au"); // returns foo.bar
uri.DnsSafeHost.GetSubdomain(); // returns foo.bar.car
Here's the extension method:
/// <summary>Gets the subdomain portion of a url, given a known "root" domain</summary>
public static string GetSubdomain(this string url, string domain = null)
{
var subdomain = url;
if(subdomain != null)
{
if(domain == null)
{
// Since we were not provided with a known domain, assume that second-to-last period divides the subdomain from the domain.
var nodes = url.Split('.');
var lastNodeIndex = nodes.Length - 1;
if(lastNodeIndex > 0)
domain = nodes[lastNodeIndex-1] + "." + nodes[lastNodeIndex];
}
// Verify that what we think is the domain is truly the ending of the hostname... otherwise we're hooped.
if (!subdomain.EndsWith(domain))
throw new ArgumentException("Site was not loaded from the expected domain");
// Quash the domain portion, which should leave us with the subdomain and a trailing dot IF there is a subdomain.
subdomain = subdomain.Replace(domain, "");
// Check if we have anything left. If we don't, there was no subdomain, the request was directly to the root domain:
if (string.IsNullOrWhiteSpace(subdomain))
return null;
// Quash any trailing periods
subdomain = subdomain.TrimEnd(new[] {'.'});
}
return subdomain;
}
You can use the following nuget package Nager.PublicSuffix. It uses the PUBLIC SUFFIX LIST from Mozilla to split the domain.
PM> Install-Package Nager.PublicSuffix
Example
var domainParser = new DomainParser();
var data = await domainParser.LoadDataAsync();
var tldRules = domainParser.ParseRules(data);
domainParser.AddRules(tldRules);
var domainName = domainParser.Get("sub.test.co.uk");
//domainName.Domain = "test";
//domainName.Hostname = "sub.test.co.uk";
//domainName.RegistrableDomain = "test.co.uk";
//domainName.SubDomain = "sub";
//domainName.TLD = "co.uk";