Matching UTF Characters with preg_match in PHP: (*UTF8) Works on Windows but not Linux

后端未结

关注

 3  472

I have a simple regular expression to check a username:

preg_match(\'/(*UTF8)^[[:alnum:]]([[:alnum:]]|[ _.-])+$/i\', $username);

In local t

相关标签:

3条回答

盖世英雄少女心

2020-12-10 08:10
Try it by describing the characters by its Unicode character properties:
```
preg_match('/^\p{L}[\p{L} _.-]+$/u', $username)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
生来不讨喜

2020-12-10 08:11
it seems it is an old post but as it is always a subject of interest I will post what I discovered here. It is a small difference but makes code more simple. The thing is that curly brackets are optional.

The above code of Gumbo and Scott can be written more simple like this if someone wants to allow only letters (Unicode & non-Unicode) and blank spaces:
```
preg_match("/^\pL[\pL ]+$/u",$string)
```
I also noticed that preg_match accepts even more simple code as the following :
```
preg_match("/^[\pL ]+$/u",$string)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
别跟我提以往

2020-12-10 08:13

I had already been trying with the /u parameter mentioned. On windows (PHP 5.2.16), adding the /u parameter worked fine for capturing a string containing unicode characters, however on CentOS 5 and PHP 5.2.16 i could still not capture a string containing unicode characters, using .* (preg_match basically failed to capture).

After a long time getting nowhere, messing around with the 'LOCALE' settings which changed nothing, i finally found this site.

I did an rpm -Uvh of the appropriate version rpm provided, restarted apache, and suddenly my regexes worked great!

Even though I had UTF-8 support initially, my regexes were not capturing unicode strings until I installed the updated rpm, which also adds "Unicode properties support". I thought having UTF-8 support would have been enough, but apparently not.

0 讨论(0)
发布评论:

提交评论
- 加载中...