问题
I have one string and want remove any character not in any case below:
not in this list : ÀÁÂÃÈÉÊÌÍÒÓÔÕÙÚĂĐĨŨƠàáâãèéêìíòóôõùúăđĩũơƯĂẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼỀỀỂ ưăạảấầẩẫậắằẳẵặẹẻẽềềểỄỆỈỊỌỎỐỒỔỖỘỚỜỞỠỢỤỦỨỪễệỉịọỏốồổỗộớờởỡợụủứừỬỮỰỲỴÝỶỸửữựỳỵỷỹ
not in [a-z 0-9 A-Z]
not is : _ and white space.
can anyone help me with this regex in php?
回答1:
Try this regular expression:
/[^a-z0-9A-Z_ÀÁÂÃÈÉÊÌÍÒÓÔÕÙÚĂĐĨŨƠàáâãèéêìíòóôõùúăđĩũơƯĂẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼỀỀỂưăạảấầẩẫậắằẳẵặẹẻẽềềểỄỆỈỊỌỎỐỒỔỖỘỚỜỞỠỢỤỦỨỪễệỉịọỏốồổỗộớờởỡợụủứừỬỮỰỲỴÝỶỸửữựỳỵỷỹ]/u
The u modifier makes PHP to interpret the pattern string as UTF-8.
If that doesn’t work, try using Unicode character properties like \p{L}
for letters or the escape sequence \x{1234}
for describing single Unicode characters or custom character ranges:
/[^a-z0-9A-Z_\x{00C0}-\x{00FF}\x{1EA0}-\x{1EFF}]/u
回答2:
Be careful. Vietnamese Unicode characters may be "decomposed" into "combining characters" with one codepoint for the base character and one or more codepoints for addittional diacritics, or they may be "precomposed" into single Unicode codepoints. Combining diacritics won't work as expected with a regular expression range []
since you will match them no matter what base character they combine with.
Older versions of Unicode did not contain the full set of Vietnamese precomposed characters so expect to find Vietnamese with combining characters in the wild. You can convert combining characters into precomposed characters using Unicode normalization form C, NFC.
回答3:
The above regexes lacks of ế
, also ă
and ề
are duplicated.
List of correct Vietnamese characters:
àáãạảăắằẳẵặâấầẩẫậèéẹẻẽêềếểễệđìíĩỉịòóõọỏôốồổỗộơớờởỡợùúũụủưứừửữựỳỵỷỹýÀÁÃẠẢĂẮẰẲẴẶÂẤẦẨẪẬÈÉẸẺẼÊỀẾỂỄỆĐÌÍĨỈỊÒÓÕỌỎÔỐỒỔỖỘƠỚỜỞỠỢÙÚŨỤỦƯỨỪỬỮỰỲỴỶỸÝ
Also, remember to normalize the string in NFC form (string.normalize('NFC')
) before testing it with the regex. Read more here.
回答4:
$newtext = preg_replace('/[^a-z0-9A-Z_[:space:]ÀÁÂÃÈÉÊÌÍÒÓÔÕÙÚĂĐĨŨƠàáâãèéêìíòóôõùúăđĩũơƯĂẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼỀỀỂ ưăạảấầẩẫậắằẳẵặẹẻẽềềểỄỆỈỊỌỎỐỒỔỖỘỚỜỞỠỢỤỦỨỪễệỉịọỏốồổỗộớờởỡợụủứừỬỮỰỲỴÝỶỸửữựỳỵỷỹ]/u','',$text);
来源:https://stackoverflow.com/questions/3819791/regex-in-vietnamese-characters