问题
I am trying to extract key-value pairs from a long string in two basic forms, one with and one without quotes, like
... a="First Field" b=SecondField ...
using the Java
regular expression
\b(a|b)\s*(?:=)\s*("[^"]*"|[^ ]*)\b
However, running the following test code
public static void main(String[] args) {
String input = "a=\"First Field\" b=SecondField";
String regex = "\\b(a|b)\\s*(?:=)\\s*(\"[^\"]*\"|[^ ]*)\\b";
Matcher matcher = Pattern.compile(regex).matcher(input);
while (matcher.find()) {
System.out.println(matcher.group(1) + " = " + matcher.group(2));
}
}
the output is
a = "First
b = SecondField
instead of the desired (without quotes)
a = First Field
b = SecondField
In a more generalized input, like
a ="First Field" b=SecondField c3= "Third field value" delta = "" e_value = five!
the output should be (again, without quotes and with varying amounts of white space before and after the =
sign)
a = First Field
b = SecondField
c3 = Third field value
delta =
e_value = five!
Is there a regular expression to cover the above use case (at least the version with the 2 keys), or should one resort to string processing?
Even trickier question: if there is such a regex, is there also any way of keeping the index of the matcher group corresponding to the value constant, so that both the quoted field value and the unquoted field value correspond to the same group index?
回答1:
Get the matched group from index 1 and 2
(\w+)=(?:")?(.*?(?="?\s+\w+=|(?:"?)$))
here is DEMO
sample code:
String str = "a=\"First Field\" b=SecondField c=\"ThirdField\" d=\"FourthField\"";
Pattern p = Pattern.compile("(\\w+)=(?:\")?(.*?(?=\"?\\s+\\w+=|(?:\"?)$))");
Matcher m = p.matcher(str);
while (m.find()) {
System.out.println("key : " + m.group(1) + "\tValue : " + m.group(2));
}
output:
key : a Value : First Field
key : b Value : SecondField
key : c Value : ThirdField
key : d Value : FourthField
If you are looking for just a
and b
keys then just make slight change in the regex pattern.
Replace first \w+
with a|b
(a|b)=(?:")?(.*?(?="?\s+\w+=|(?:"?)$))
Here is DEMO
EDIT
As per edit of the post
simply add \s
to check for white spaces as well.
(\w+)\s*=\s*(?:")?(.*?(?="?\s+\w+\s*=|(?:"?)$))
DEMO
回答2:
You can modify your regex to the following:
/\b(\w+)\s*=\s*(?:"([^"]*)"|([^ ]*)\b)/
Notable changes:
- You can use
\w+
in java to capture word characters[A-Za-z0-9_]
. - You do not need to wrap
=
in a non-capturing group(?:=)
. - The alternation is now wrapped in a non-capturing group.
- The match should only end with a word boundary when not finished by
"
.
Please see the following code:
{
String input = "a =\"First Field\" b=SecondField c3= \"Third field value\" delta = \"\" e_value = five!";
String regex = "\\b(\\w+)\\s*=\\s*(?:\"([^\"]*)\"|([^ ]*)\\b)";
Matcher matcher = Pattern.compile(regex).matcher(input);
while (matcher.find())
System.out.println(matcher.group(1) + " = " +
(matcher.group(2) == null ? matcher.group(3) : matcher.group(2)));
}
View a regex demo and a code demo!
Code demo
STDOUT
:a = First Field b = SecondField c3 = Third field value delta = e_value = five
回答3:
Your java regex "\b(a|b)\s*(?:=)\s*("[^"]"|[^ ])\b" will produce the output:
a = "First
b = SecondField
It's due to after'"' is not a \b boundary. therefore, your first name/value pair with quotaiton will never be matched.
You could change it a bit like this:
"\b(a|b)\s*=\s*(?:"([^"]*)"|([^ ]*))"
The whole sample code is listed as below:
String input = "a=\"First Field\" b=SecondField";
String regex = "\\b(a|b)\\s*=\\s*(?:\"([^\"]*)\"|([^ ]*))";
Matcher matcher = Pattern.compile(regex).matcher(input);
while (matcher.find()) {
if(matcher.group(2) != null) {
System.out.println(matcher.group(1) + " = " + matcher.group(2));
}else {
System.out.println(matcher.group(1) + " = " + matcher.group(3));
}
}
The output is like:
a = First Field
b = SecondField
Meanwhile, if your key is not just 'a or b', it's a workd, then you could chang (a|b) to (\w+)
回答4:
(a|b)\s*(?:=)\s*("[^"]*"|[^ ]*)
Tried with this.Works fine. http://regex101.com/r/zR7cW9/1
来源:https://stackoverflow.com/questions/25134348/java-regex-to-extract-fields-with-or-without-quotes