regex split text document into sentences

允我心安 提交于 2020-01-04 06:32:10

问题


I have a big text string and I am trying to split it into the sentences based on ". ? !". But my regex is not working somehow, can somebody guide me to detect the error?

String str = "When my friend said he likes deep dish pizza one day, I immediately set a time to come back to Little Star. Arguably, the best deep dish pizza in SF...though...I don't believe there are many places that do deep dish pizza. That being said...its not the BEST ever, just the best for the area. They use cornmeal in the crust, or on the baking surface, so there's a bit of extra crunch to it. That being said...I'm not sure how much I like the cornmeal texture to my pizza. I kind of want just a GOOD CRUST, you know? No extra stuff to try to make it more crunchy.";
String[] sentences = str.split("/(?<=[.?!])\\S+(?=[a-z])/i");

But it is not splitting the sentences. Can somebody detect the error?


回答1:


You have wrong regex. Java doesn't understand regex like this PCRE type regex:

/(?<=[.?!])\\S+(?=[a-z])/i

Use this:

String[] sentences = str.split("(?i)(?<=[.?!])\\S+(?=[a-z])");



回答2:


Here's a little tip:

slashes have nothing whatsoever to do with regex

Slashes are an application language artefact of *some+ languages. Java is not one of them.

Try removing the slashes and replacing the trailing "/i" with "(?i)":

String[] sentences = str.split("(?i)(?<=[.?!])\\S+(?=[a-z])");


来源:https://stackoverflow.com/questions/17654738/regex-split-text-document-into-sentences

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!