问题
I did a question about punctuation and regex, but it was confusing.
Supossing I have this text:
String text = "wor.d1, :word2. wo,rd3? word4!";
I'm doing this:
String parts[] = text.split(" ");
And I have this:
wor.d1, | :word2. | wor,d3? | word4!;
What I need to do to have this? (Keep the the symbols at the borders, but only that I specify: .,!?:
, not all).
wor,d1 | , | : | word2 | . | wor,d3 | ? | word4 | !
UPDATE
I'm getting some good results with these regex, but it's giving an empty char before all splits on punctuation at start of a word.
There is a way to not have this empty char at the start?
Is this regex is good, or there is a more simple way?
public static final String PUNCTUATION_SEPARATOR =
"("
+ "("
+ "(?=^[\"'!?.,;:(){}\\[\\]]+)"
+ "|"
+ "(?<=^[\"'!?.,;:(){}\\[\\]]+)"
+ ")"
+ "|"
+ "("
+ "(?=[\"'!?.,;:(){}\\[\\]]+($|\n))"
+ "|"
+ "(?<=[\"'!?.,;:(){}\\[\\]]+($|\n))"
+ ")"
+ ")";
回答1:
Are you sure you want to use regex ? There's a faster implementation for splitting by single char: StringTokenizer. And it that can return the delimiters.
String str= "word1, word2. word3? word4!";
String delim = ",.!?";
StringTokenizer st = new StringTokenizer(str, delim, true);
while (st.hasMoreTokens()) {
String token = st.nextToken();
... // token will be: "word1", ",", " word2", ".", etc...
}
回答2:
For simple separators I recommend the StringTokenizer. But here's a solution using regex and another auxiliary separator:
String s = "one,two, three four , five";
s = s.replaceAll("([,\\s]+)", "#$1#");
Pattern p = Pattern.compile("#");
String[] result = p.split(s);
回答3:
Here's a regex that I think will work:
/\s|(?=[\.,:?!](\W|$))|(?<=\W[\.:?!])/
回答4:
In my opinion you want this. First you explode your string and second step you use implode function.
回答5:
public static final String PUNCTUATION_SEPARATOR =
"("
+ "("
+ "(?=^[\"'!?.,;:(){}\\[\\]-]+)"
+ "|"
+ "(?<=^[\"'!?.,;:(){}\\[\\]-]+)"
+ ")"
+ "|"
+ "("
+ "(?=[\"'!?.,;:(){}\\[\\]-]+($|\n))"
+ "|"
+ "(?<=[\"'!?.,;:(){}\\[\\]-]+($|\n))"
+ ")"
+ ")";
来源:https://stackoverflow.com/questions/7127384/how-to-keep-the-delimiter-while-using-regex