In Java, is there a simple way to extract a substring by specifying the regular expression delimiters on either side, without including the delimiters in the final substring
Write a regex like this:
"(regex1)(.*)(regex2)"
... and pull out the middle group from the matcher (to handle newlines in your pattern you want to use Pattern.DOTALL).
Using your example we can write a program like:
package test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex {
public static void main(String[] args) {
Pattern p = Pattern.compile(
"<row><column>(.*)</column></row>",
Pattern.DOTALL
);
Matcher matcher = p.matcher(
"<row><column>Header\n\n\ntext</column></row>"
);
if(matcher.matches()){
System.out.println(matcher.group(1));
}
}
}
Which when run prints out:
Header
text
You should not use regular expressions to decode XML - this will eventually break if the input is not strictly controlled.
The easiest thing is probably to parse the XML up in a DOM tree (Java 1.4 and newer contain a XML parser directly) and then navigate the tree to pick out what you need.
Perhaps you would like to tell what you want to accomplish with your program?