Regex to extract key-value pairs separated by space, with space in values

醉酒当歌 提交于 2019-12-22 09:40:12

问题


Assume a one-line string with multiple consecutive key-value pairs, separated by a space, but with space allowed also within values (not in keys), e.g.

key1=one two three key2=four key3=five six key4=seven eight nine ten

Correctly extracting the key-value pairs from above would produce the following mappings:

"key1", "one two"
"key2", "four"
"key3", "five six"
"key4", "seven eight nine ten"

where "keyX" can be any sequence of characters, excluding space.

Trying something simple, like

([^=]+=[^=]+)+

or similar variations is not adequate.

Is there a regex to fully handle such extraction, without any further string processing?


回答1:


Try with a lookahead:

(\b\w+)=(.*?(?=\s\w+=|$))

As a Java String:

"(\\b\\w+)=(.*?(?=\\s\\w+=|$))"

Test at regex101.com; Test at regexplanet (click on "Java")




回答2:


\1 contains the key and \2 the value:

(key\d+)=(.*?)(?= key\d+|$)

Escape \ with \\ in Java:

(key\\d+)=(.*?)(?= key\\d+|$)

Demo: https://regex101.com/r/dO8kM2/1




回答3:


Rather then a regular expression, I suggest you parse it using indexOf. Something like,

String in = "key1=one two three key2=four key3=five six "
        + "key4=seven eight nine ten";
Map<String, String> kvp = new LinkedHashMap<>();
int prev = 0;
int start;
while ((start = in.indexOf("key", prev)) != -1) {
    // Find the next "=" sign.
    int eqlIndex = in.indexOf("=", start + 3);
    // Find the end... maybe the end of the String.
    int end = in.indexOf("key", eqlIndex + 1);
    if (end == -1) {
        // It's the end of the String.
        end = in.length();
    } else {
        // One less than the next "key"
        end--;
    }
    kvp.put(in.substring(start, eqlIndex),
            in.substring(eqlIndex + 1, end).trim());
    prev = start + 3;
}
for (String key : kvp.keySet()) {
    System.out.printf("%s=\"%s\"%n", key, kvp.get(key));
}

Output is

key1="one two three"
key2="four"
key3="five six"
key4="seven eight nine ten"



回答4:


Something like this is also possible if whitespaces are not duplicated:

([^\\s=]+)=([^=]+(?=\\s|$))

otherwise you can always write this:

([^\\s=]+)=([^=]+\\b(?=\\s|$))

These patterns are a good solution if key names are not too long since they use the backtracking.

you can also write this that needs at most one step of backtracking:

([^\\s=]+)=(\\S+(?>\\s+[^=\\s]+)*(?!=))


来源:https://stackoverflow.com/questions/28131004/regex-to-extract-key-value-pairs-separated-by-space-with-space-in-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!