I\'m trying to use a regex to split a chunk of Chinese text into sentences. For my purposes, sentence delimiters are:
Your regex code should be like this to be able to capture string + delimiter:
$str = "你好。你好吗? 我是程序员,不太懂这个我问题,希望大家能够帮忙!一起加油吧!";
$arr = preg_split("/\s*([^\x{3002}\x{FF01}\x{FF1F}]+[\x{3002}\x{FF01}\x{FF1F}]\s*)/u",
$str, 0, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY );
var_dump($arr);
OUTPUT:
array(4) {
[0]=> string(9) "你好。"
[1]=> string(13) "你好吗? "
[2]=> string(72) "我是程序员,不太懂这个我问题,希望大家能够帮忙!"
[3]=> string(18) "一起加油吧!"
}