Need regex for utf8 multilingual search query

五迷三道 提交于 2019-12-13 08:56:21

问题


I need a Regex for to use with preg_replace php function in the search form input to use in SQL full text search in a MySQL multilingual utf8 database. I have considered using php filter_var with FILTER_SANITIZE_STRING, but I ended up with preg_replace:

I want these features:

  1. keep spaces and only one if more in a row (serial spaces)
  2. keep double quotes and only one if more in a row(so that I could use it in phrase in IN BOOLEAN MODE)
  3. keep - & + & '~' and only one if more in a row
  4. as I want it to be multi lingual it should consider Unicode (utf8) letters too
  5. I do not have/need accents to be considered.

This is what I have done:

$q = addslashes($q);
$q = preg_replace('/[^\w\d\s\s+\p{L}]/u', "", $q);

But the output does not satisfy me with like with quotes(") and minus (-). How can I write a safe query string to use in my search box?

Are there any better practises than using preg_replace?


回答1:


You have to do 2 preg_replace.

1- Replace invalid characters by nothing:

$q = preg_replace('/[^\p{L}\d\s~+"-]+/', '', $q);

2- Replace multiple char like spaces, ~, +, ", - by only one:

$q = preg_replace('/([\s~+"-])\1+/', "$1", $q);


来源:https://stackoverflow.com/questions/17517313/need-regex-for-utf8-multilingual-search-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!