PHP - support for multibyte safe regular expressions

梦想的初衷 提交于 2019-12-22 09:34:35

问题


PHP supports regular expressions in three ways:

  • POSIX ERE, now removed in PHP 7+
  • PCRE which is a core component, but not always multibyte safe
  • Multibyte String, which is not enabled by default

Today the web is Unicode, and PHP is too since 5.6 because of i18n. While PHP itself is known to be abysmally bad in supporting Unicode, Intl provides access to the relieving ICU library.

To avoid the long wait for UString and repetition (and memory) when doin' it right, I prefer Intl and leave out iconv, Multibyte String along with DateTime, and rewrite most of the SBCS string functions to be multibyte safe. In that process some issues arise:

  • locale formatting large numbers is problematic on 32 bit platforms (like NASes) when a database offers storage for 64 bit numbers. It can be solved by using numbers as string via BCMath.
  • Intl wrapper has no support for ICU's regular expression functions, the Unicode variant of PCRE remains.

To use PCRE with Unicode syntax, PHP's buit-in PCRE has to be compiled and configured with Unicode support. On some systems it is not configured with Unicode, adding (*UTF8) before the expression overrides configuration.

  • have I missed a way to work with ICU's regular expression functions from PHP?
  • are there any other pitfalls to take into account for Unicode PCRE?
  • have I missed a reason why Multibyte String should be used?

来源:https://stackoverflow.com/questions/44482056/php-support-for-multibyte-safe-regular-expressions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!