Regex to remove non alphanumeric characters from UTF8 strings

后端 未结 4 454
傲寒
傲寒 2021-01-11 11:36

How can I remove characters, like punctuation, commas, dashes etc from a string, in a multibyte safe manner?

I will be working with input from many different languag

4条回答
  •  北荒
    北荒 (楼主)
    2021-01-11 12:21

    There are the unicode character class thingys that you can use:

    • http://www.regular-expressions.info/unicode.html
    • http://php.net/manual/en/regexp.reference.unicode.php

    To match any non-letter symbols you can just use \PL+, the negation of \p{L}. To not remove spaces, use a charclass like [^\pL\s]+. Or really just remove punctuation with \pP+

    Well, and obviously don't forget the regex /u modifier.

提交回复
热议问题