问题
I am trying to split a sentence with 32 chars in each group of regex. The sentence is split after the complete word if 32nd character is a letter in the word. When my input is a sentence which has "-" it splits that word too.
This is the regex I am using
(\b.{1,32}\b\W?)
Input string:
Half Bone-in Spiral int with dark Packd Smithfield Half Bone-in Spiral Ham with Glaze Pack
resulting groups:
- Half Bone-in Spiral int with
- dark Packd Smithfield Half Bone-
- in Spiral Ham with Glaze Pack
In above split "Bone-in" is one word but regex splits it considering separate words. How can I modify my regex to treat "-" as one word? In short, I want the split after Bone-in.
Thank You.
回答1:
You may use
(\b.{1,32}(?![\w-])\W?)
Details
\b
- a word boundary.{1,32}
- 1 to 32 chars other than line break chars, as many as possible(?![\w-])
- the char immediately to the left of the current location cannot be a word (letter, digit or_
) or-
char\W?
- an optional non-word char.
In Java, use the following method:
public static String[] splitIncludeDelimeter(String regex, String text){
List<String> list = new LinkedList<>();
Matcher matcher = Pattern.compile(regex).matcher(text);
int now, old = 0;
while(matcher.find()){
now = matcher.end();
list.add(text.substring(old, now));
old = now;
}
if(list.size() == 0)
return new String[]{text};
//adding rest of a text as last element
String finalElement = text.substring(old);
list.add(finalElement);
return list.toArray(new String[list.size()]);
}
Java example:
String s = "Half Bone-in Spiral int with dark Packd Smithfield Half Bone-in Spiral Ham with Glaze Pack";
String[] res = splitIncludeDelimeter("(\\b.{1,32}(?![\\w-])\\W?)", s);
System.out.println(Arrays.toString(res));
// => [Half Bone-in Spiral int with , dark Packd Smithfield Half , Bone-in Spiral Ham with Glaze , Pack, ]
来源:https://stackoverflow.com/questions/53601310/splitting-string-in-regex-with-as-one-word