How to keep the delimiter while using RegEx?

坚强是说给别人听的谎言 提交于 2020-01-14 07:59:27

问题


I did a question about punctuation and regex, but it was confusing.

Supossing I have this text:

String text = "wor.d1, :word2. wo,rd3? word4!"; 

I'm doing this:

String parts[] = text.split(" ");

And I have this:

wor.d1, | :word2. | wor,d3? | word4!;

What I need to do to have this? (Keep the the symbols at the borders, but only that I specify: .,!?:, not all).

wor,d1 | , | : | word2 | . | wor,d3 | ? | word4 | !

UPDATE

I'm getting some good results with these regex, but it's giving an empty char before all splits on punctuation at start of a word.

There is a way to not have this empty char at the start?

Is this regex is good, or there is a more simple way?

public static final String PUNCTUATION_SEPARATOR =
        "("
        + "("
        + "(?=^[\"'!?.,;:(){}\\[\\]]+)"
        + "|"
        + "(?<=^[\"'!?.,;:(){}\\[\\]]+)"
        + ")"
        + "|"
        + "("
        + "(?=[\"'!?.,;:(){}\\[\\]]+($|\n))"
        + "|"
        + "(?<=[\"'!?.,;:(){}\\[\\]]+($|\n))"
        + ")"
        + ")";

回答1:


Are you sure you want to use regex ? There's a faster implementation for splitting by single char: StringTokenizer. And it that can return the delimiters.

String str= "word1, word2. word3? word4!";
String delim = ",.!?";
StringTokenizer st = new StringTokenizer(str, delim, true);
while (st.hasMoreTokens()) {
  String token = st.nextToken();
  ... // token will be: "word1", ",", " word2", ".", etc...
}



回答2:


For simple separators I recommend the StringTokenizer. But here's a solution using regex and another auxiliary separator:

String s  = "one,two, three   four ,  five";
s = s.replaceAll("([,\\s]+)", "#$1#");
Pattern p = Pattern.compile("#");
String[] result = p.split(s);



回答3:


Here's a regex that I think will work:

/\s|(?=[\.,:?!](\W|$))|(?<=\W[\.:?!])/



回答4:


In my opinion you want this. First you explode your string and second step you use implode function.




回答5:


public static final String PUNCTUATION_SEPARATOR =
    "("
    + "("
    + "(?=^[\"'!?.,;:(){}\\[\\]-]+)"
    + "|"
    + "(?<=^[\"'!?.,;:(){}\\[\\]-]+)"
    + ")"
    + "|"
    + "("
    + "(?=[\"'!?.,;:(){}\\[\\]-]+($|\n))"
    + "|"
    + "(?<=[\"'!?.,;:(){}\\[\\]-]+($|\n))"
    + ")"
    + ")";


来源:https://stackoverflow.com/questions/7127384/how-to-keep-the-delimiter-while-using-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!