Need regex to get domain + subdomain

萝らか妹 提交于 2019-12-24 06:19:23

问题


So im using this function here:

function get_domain($url)
{
  $pieces = parse_url($url);
  $domain = isset($pieces['host']) ? $pieces['host'] : '';
  if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
    return $regs['domain'];
  }
  return false;
}

$referer = get_domain($_SERVER['HTTP_REFERER']);

And what i need is another regex for it, if someone would be so kind to help. Exactly what i need is for it to get the whole domain, including subdomains.

Lets say as a real problem i have now. When people blogging link from example: myblog.blogger.com The referer url will be just blogger.com, which is not ideal..

So if someone could help me so i can get the including subdomain as regex code for the function above, id apreciate it alot!

Thanks!


回答1:


This regex should match a domain in a string, including any dubdomains:

/([a-z0-9|-]+\.)*[a-z0-9|-]+\.[a-z]+/

Translated to rough english, it functions like this: "match the first part of the string that has 'sometextornumbers.sometext', and also include any number of 'sometextornumbers.' that might preceed it.

See it in action here: http://regexr.com?2vppk

Note that the multiline and global flags in that link are only there to be able to match the entire blob of test-text, so you don't need if you're passing only one line to the regex




回答2:


Good luck with the above as Domain names now contain non-roman characters. These would have to be processed into equivalent but unique ascii before regex could work reliably. See RFC 3490 Internationalizing Domain Names in Applications (IDNA) ... See https://tools.ietf.org/html/rfc3490 which has

Until now, there has been no standard method for domain names to use
characters outside the ASCII repertoire. This document defines
internationalized domain names (IDNs) and a mechanism called
Internationalizing Domain Names in Applications (IDNA) for handling
them in a standard fashion. IDNs use characters drawn from a large
repertoire (Unicode), but IDNA allows the non-ASCII characters to be
represented using only the ASCII characters already allowed in so-
called host names today. This backward-compatible representation is
required in existing protocols like DNS, so that IDNs can be
introduced with no changes to the existing infrastructure. IDNA is
only meant for processing domain names, not free text.




回答3:


Better solution:

/^([a-z0-9|-]+[a-z0-9]{1,}\.)*[a-z0-9|-]+[a-z0-9]{1,}\.[a-z]{2,}$/

Regex sample: https://regexr.com/4k71a

And for email address:

/^[a-z0-9|.|-]+[a-z0-9]{1,}@([a-z0-9|-]+[a-z0-9]{1,}\.)*[a-z0-9|-]+[a-z0-9]{1,}\.[a-z]{2,}$/


来源:https://stackoverflow.com/questions/8959765/need-regex-to-get-domain-subdomain

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!