I am new to java regex.Please help me. Consider the below paragraph,
Paragraph :
Name abc
sadghsagh
hsajdjah N
The following should also do if you want to keep both Name
and !!!
in the results.
String[] parts = string.split("(?=(Name|!!!))");
Edit: here's the corrected version:
String[] parts = string.split("(?<=!!!)\\s*(?=Name)");
This will split on any whitespace between !!!
and Name
and nothing else; hereby keeping the both parts. If you don't want to split on !!!Name
, then replace \\s*
by \\s+
to allow a one-to-many match instead of zero-to-many match.
Edit2: attached an example of the input/output. Input is copied from the topicstart:
String string = "Name hhhhh class0" + "\n" + "HHHHHHHHHHHHHHHHHH" + "\n" + "!" + "\n"
+ "Name TTTTT TTTT" + "\n" + "GGGGGG UUUUU IIII" + "\n" + "!" + "\n"
+ "Name JJJJJ WWWW" + "\n" + "IIIIIIIIIIIIIIIIIIIII" + "\n" + "!" + "\n"
+ "RRRRRRRRRRR TTTTTTTT" + "\n" + "HHHHHH" + "\n" + "JJJJJ 1 Name class1" + "\n"
+ "LLLLL 5 Name class5" + "\n" + "!" + "\n" + "OOOOOO HHHH FFFFFF" + "\n"
+ "service 0 Name class12" + "\n" + "!" + "\n" + "JJJJJ YYYYYY 3/0" + "\n" + "KKKKKKK"
+ "\n" + "UUU UUU UUUUU" + "\n" + "QQQQQQQ" + "\n" + "!";
String[] parts = string.split("(?<=!)\\s*(?=Name)");
for (String part : parts) {
System.out.println(part);
System.out.println("---------------------------------");
}
Output:
Name hhhhh class0
HHHHHHHHHHHHHHHHHH
!
---------------------------------
Name TTTTT TTTT
GGGGGG UUUUU IIII
!
---------------------------------
Name JJJJJ WWWW
IIIIIIIIIIIIIIIIIIIII
!
RRRRRRRRRRR TTTTTTTT
HHHHHH
JJJJJ 1 Name class1
LLLLL 5 Name class5
!
OOOOOO HHHH FFFFFF
service 0 Name class12
!
JJJJJ YYYYYY 3/0
KKKKKKK
UUU UUU UUUUU
QQQQQQQ
!
---------------------------------
Looks fine?
Try:
import java.util.*;
import java.util.regex.*;
public class Main {
public static String[] tokenize(String text, String start, String end) {
// old line:
//Pattern p = Pattern.compile("(?s)"+Pattern.quote(start)+".*?"+Pattern.quote(end));
// new line:
Pattern p = Pattern.compile("(?sm)^"+Pattern.quote(start)+".*?"+Pattern.quote(end)+"$");
Matcher m = p.matcher(text);
List<String> tokens = new ArrayList<String>();
while(m.find()) {
tokens.add(m.group());
}
return tokens.toArray(new String[]{});
}
public static void main(String[] args) {
String text = "Name abc" + "\n" +
"sadghsagh" + "\n" +
"hsajdjah Name" + "\n" +
"ggggggggg" + "\n" +
"!!!" + "\n" +
"Name ggg" + "\n" +
"dfdfddfdf Name" + "\n" +
"!!!" + "\n" +
"Name hhhh" + "\n" +
"sahdgashdg Name" + "\n" +
"asjdhjasdh" + "\n" +
"sadasldkalskd" + "\n" +
"asdjhakjsdhja" + "\n" +
"!!!";
String[] tokens = tokenize(text, "Name", "!!!");
int n = 0;
for(String t : tokens) {
System.out.println("---------------------------\n"+(++n)+"\n"+t);
}
}
}
String s = "Name abc sadghsagh hsajdjah !!! Name ggg dfdfddfdf !!! Name hhhh sahdgashdg asjdhjasdh sadasldkalskd asdjhakjsdhja !!!!! ";
String startsWith = "Name";
String endsWith = "!!!";
// non-greedily get all groups starting with Name and ending with !!!
String pattern = String.format("(%s).*?(%s)", Pattern.quote(startsWith), Pattern.quote(endsWith));
System.out.println(pattern);
Matcher m = Pattern.compile(pattern, Pattern.DOTALL).matcher(s);
while (m.find())
System.out.println(m.group());
output:
(\QName\E).*?(\Q!!!\E)
Name abc sadghsagh hsajdjah !!!
Name ggg dfdfddfdf !!!
Name hhhh sahdgashdg asjdhjasdh sadasldkalskd asdjhakjsdhja !!!