问题
I would like to split a text过公元年?因为无论你如何选择。简体字危及了对古代文学的研究输入!
Using on of these three (or more) ?!。 characters as delimiter.
i can do this of course with$lines = preg_split('/[。,!,?]/u',$body);
However i wan't to have the resulting lines keep their ending delimiter. Also a sentence might end like so 啊。。。
or 什么!??!!!!
回答1:
Try this:
$lines = preg_split('/(?<=[。!?])(?![。!?])/u',$body);
It splits at a position that's preceded by one of your delimiter characters but not followed by one. It doesn't consume the delimiter, and if there are two or more consecutive delimiters, it only matches after the last one.
回答2:
In this case, you'd like to write the string splitter yourself. And keep continuous delimiters as a whole. (you can set a state variable indicating whether it is in text block or delimiter block).
回答3:
You should use preg_match_all
instead of preg_split
, i.e.
preg_match_all("/[^?!。]+[?!。]+/u", $text, $res);
See http://www.ideone.com/rN7MB for usage.
来源:https://stackoverflow.com/questions/3437982/split-by-various-delimiters-while-keeping-the-delimiter