Split by various delimiters, while keeping the delimiter?

雨燕双飞 提交于 2019-12-22 18:25:57

问题


I would like to split a text
过公元年?因为无论你如何选择。简体字危及了对古代文学的研究输入!

Using on of these three (or more) ?!。 characters as delimiter. i can do this of course with
$lines = preg_split('/[。,!,?]/u',$body);

However i wan't to have the resulting lines keep their ending delimiter. Also a sentence might end like so 啊。。。 or 什么!??!!!!


回答1:


Try this:

$lines = preg_split('/(?<=[。!?])(?![。!?])/u',$body);

It splits at a position that's preceded by one of your delimiter characters but not followed by one. It doesn't consume the delimiter, and if there are two or more consecutive delimiters, it only matches after the last one.




回答2:


In this case, you'd like to write the string splitter yourself. And keep continuous delimiters as a whole. (you can set a state variable indicating whether it is in text block or delimiter block).




回答3:


You should use preg_match_all instead of preg_split, i.e.

preg_match_all("/[^?!。]+[?!。]+/u", $text, $res);

See http://www.ideone.com/rN7MB for usage.



来源:https://stackoverflow.com/questions/3437982/split-by-various-delimiters-while-keeping-the-delimiter

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!