问题
Assume a one-line string with multiple consecutive key-value pairs, separated by a space, but with space allowed also within values (not in keys), e.g.
key1=one two three key2=four key3=five six key4=seven eight nine ten
Correctly extracting the key-value pairs from above would produce the following mappings:
"key1", "one two"
"key2", "four"
"key3", "five six"
"key4", "seven eight nine ten"
where "keyX" can be any sequence of characters, excluding space.
Trying something simple, like
([^=]+=[^=]+)+
or similar variations is not adequate.
Is there a regex to fully handle such extraction, without any further string processing?
回答1:
Try with a lookahead:
(\b\w+)=(.*?(?=\s\w+=|$))
As a Java String:
"(\\b\\w+)=(.*?(?=\\s\\w+=|$))"
Test at regex101.com; Test at regexplanet (click on "Java")
回答2:
\1
contains the key and \2
the value:
(key\d+)=(.*?)(?= key\d+|$)
Escape \
with \\
in Java:
(key\\d+)=(.*?)(?= key\\d+|$)
Demo: https://regex101.com/r/dO8kM2/1
回答3:
Rather then a regular expression, I suggest you parse it using indexOf
. Something like,
String in = "key1=one two three key2=four key3=five six "
+ "key4=seven eight nine ten";
Map<String, String> kvp = new LinkedHashMap<>();
int prev = 0;
int start;
while ((start = in.indexOf("key", prev)) != -1) {
// Find the next "=" sign.
int eqlIndex = in.indexOf("=", start + 3);
// Find the end... maybe the end of the String.
int end = in.indexOf("key", eqlIndex + 1);
if (end == -1) {
// It's the end of the String.
end = in.length();
} else {
// One less than the next "key"
end--;
}
kvp.put(in.substring(start, eqlIndex),
in.substring(eqlIndex + 1, end).trim());
prev = start + 3;
}
for (String key : kvp.keySet()) {
System.out.printf("%s=\"%s\"%n", key, kvp.get(key));
}
Output is
key1="one two three"
key2="four"
key3="five six"
key4="seven eight nine ten"
回答4:
Something like this is also possible if whitespaces are not duplicated:
([^\\s=]+)=([^=]+(?=\\s|$))
otherwise you can always write this:
([^\\s=]+)=([^=]+\\b(?=\\s|$))
These patterns are a good solution if key names are not too long since they use the backtracking.
you can also write this that needs at most one step of backtracking:
([^\\s=]+)=(\\S+(?>\\s+[^=\\s]+)*(?!=))
来源:https://stackoverflow.com/questions/28131004/regex-to-extract-key-value-pairs-separated-by-space-with-space-in-values