What are surrogate characters in UTF-8?
问题 I have a strange validation program that validates wheather a utf-8 string is a valid host name(Zend Framework Hostname valdiator in PHP). It allows IDNs(internationalized domain names). It will compare each subdomain with sets of characters defined by their HEX bytes representation. Two such sets are D800-DB7F and DC00-DFFF . Php regexp comparing function called preg_match fails during these comparsions and it says that DC00-DFFF characters are not allowed in this function. From wikipedia I