Filter all types of whitespace in PHP

前端 未结 3 1312
盖世英雄少女心
盖世英雄少女心 2020-12-19 12:27

I know that there are many types of space (em space, en space, thin space, non-breaking space, etc), but, all these, that I refered, have HTML entities (at least, PHP\'s htm

相关标签:
3条回答
  • 2020-12-19 12:58
    $result = preg_replace('/\s/', '', $yourString)
    

    See http://www.php.net/manual/en/regexp.reference.backslash.php for more infos on the \s

    0 讨论(0)
  • 2020-12-19 13:02

    \s by default, will not match whitespace characters with values greater than 128. To get at those, you can instead make good use of other UTF-8-aware sequences.


    (Standard disclaimer: I'm skimming the PCRE source code to compile the lists below, I may miss a character or type something incorrectly. Please forgive me.)

    \p{Zs} matches:

    • U+0020 Space
    • U+00A0 No-break space
    • U+1680 Ogham space mark
    • U+180E Mongolian vowel separator
    • U+2000 En quad
    • U+2001 Em quad
    • U+2002 En space
    • U+2003 Em space
    • U+2004 Three-per-em space
    • U+2005 Four-per-em space
    • U+2006 Six-per-em space
    • U+2007 Figure space
    • U+2008 Punctuation space
    • U+2009 Thin space
    • U+200A Hair space
    • U+202F Narrow no-break space
    • U+205F Medium mathematical space
    • U+3000 Ideographic space

    \h (Horizontal whitespace) matches the same as \p{Zs} above, plus

    • U+0009 Horizontal tab.

    Similarly for matching vertical whitespace there are a few options.

    \p{Zl} matches U+2028 Line separator.

    \p{Zp} matches U+2029 Paragraph separator.

    \v (Vertical whitespace) matches \p{Zl}, \p{Zp} and the following

    • U+000A Linefeed
    • U+000B Vertical tab
    • U+000C Formfeed
    • U+000D Carriage return
    • U+0085 Next line

    Going back to the beginning, in UTF-8 mode (i.e. using the u pattern modifier) \s will match any character that \p{Z} matches (which is anything that \p{Zs}, \p{Zl} and \p{Zp} will match), plus

    • U+0009 Horizontal tab
    • U+000A Linefeed
    • U+000C Formfeed
    • U+000D Carriage return

    To cut a long story short (I bet you read all of the above, didn't you?) you might want to use \s but make sure to be in UTF-8 mode like /\s/u. Putting that to some practical use, to filter out those matching whitespace characters from a string you would do something like

    $new_string = preg_replace('/\s/u', '', $old_string);
    

    Finally, if you really, really care about the vertical whitespaces which aren't included in \s (LF and NEL) then you can use the character class [\s\v] to match all 26 of the whitespace characters listed above.

    0 讨论(0)
  • 2020-12-19 13:10

    They are all plain spaces (returning character code 32) that can be caught with regular expressions or trim().

    Try this:

    preg_replace("/\s{2,}/", " ", $text);
    
    0 讨论(0)
提交回复
热议问题