I have below String which is in the format of key1=value1, key2=value2
which I need to load it in a map (Map
as key=value<
Given that you have no control over the payload, you need to do something to make the "illegal commas" not match your ", " regex.
Vampire provided a great regex. Since I've already gone down the road of manual parsing, I'll provide a non-regex solution below.
An alternate solution is to manually find the parse/split points yourself by iterating character by character and saving substrings. Keep track of the "last comma-space" until you get to the "next equals" in order to determine whether to split on that comma-space or not.
Here's some code that demonstrates what I'm trying to explain.
import java.util.Arrays;
public class ParseTest {
static String payload = "cossn=0, abc=hello/=world, Agent=Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36";
public static void main(String[] args) {
int lastCommaSpace = -2;
int beginIndex = 0;
// Iterate over string
// We are looking for comma-space pairs so we stop one short of end of
// string
for (int i = 0; i < payload.length() - 1; i++) {
if (payload.charAt(i) == ',' && payload.charAt(i + 1) == ' ') {
// This is the point we want to split at
lastCommaSpace = i;
}
if (payload.charAt(i) == '=' && lastCommaSpace != beginIndex - 2) {
// We've found the next equals, split at the last comma we saw
String pairToSplit = payload.substring(beginIndex, lastCommaSpace);
System.out.println("Split and add this pair:" + Arrays.toString(pairToSplit.split("=", 2)));
beginIndex = lastCommaSpace + 2;
}
}
// We got to the end, split the last one
String pairToSplit = payload.substring(beginIndex, payload.length());
System.out.println("Split and add this pair:" + Arrays.toString(pairToSplit.split("=", 2)));
}
}
As you said your keys only contain alphanumerics, the following would probably be a good heuristic for splitting:
payload.split("\\s*,\\s*(?=[a-zA-Z0-9_]+\\s*=|$)");
Which will split on probably whitespace framed commas that are followed by the end of the string or an alphanumeric key, optional whitespace and an equals sign.