Java regex to extract fields with or without quotes

眉间皱痕 提交于 2020-02-06 05:02:10

问题


I am trying to extract key-value pairs from a long string in two basic forms, one with and one without quotes, like

... a="First Field" b=SecondField ...

using the Java regular expression

\b(a|b)\s*(?:=)\s*("[^"]*"|[^ ]*)\b

However, running the following test code

public static void main(String[] args) {
  String input = "a=\"First Field\" b=SecondField";
  String regex = "\\b(a|b)\\s*(?:=)\\s*(\"[^\"]*\"|[^ ]*)\\b";
  Matcher matcher = Pattern.compile(regex).matcher(input);
  while (matcher.find()) {
    System.out.println(matcher.group(1) + " = " + matcher.group(2));
  }
}

the output is

a = "First
b = SecondField

instead of the desired (without quotes)

a = First Field
b = SecondField

In a more generalized input, like

a ="First Field" b=SecondField c3= "Third field value" delta = "" e_value  = five!

the output should be (again, without quotes and with varying amounts of white space before and after the = sign)

a = First Field
b = SecondField
c3 = Third field value
delta = 
e_value = five!

Is there a regular expression to cover the above use case (at least the version with the 2 keys), or should one resort to string processing?

Even trickier question: if there is such a regex, is there also any way of keeping the index of the matcher group corresponding to the value constant, so that both the quoted field value and the unquoted field value correspond to the same group index?


回答1:


Get the matched group from index 1 and 2

(\w+)=(?:")?(.*?(?="?\s+\w+=|(?:"?)$))

here is DEMO

sample code:

String str = "a=\"First Field\" b=SecondField c=\"ThirdField\" d=\"FourthField\"";
Pattern p = Pattern.compile("(\\w+)=(?:\")?(.*?(?=\"?\\s+\\w+=|(?:\"?)$))");
Matcher m = p.matcher(str);
while (m.find()) {
    System.out.println("key : " + m.group(1) + "\tValue : " + m.group(2));
}

output:

key : a Value : First Field
key : b Value : SecondField
key : c Value : ThirdField
key : d Value : FourthField

If you are looking for just a and b keys then just make slight change in the regex pattern.

Replace first \w+ with a|b

(a|b)=(?:")?(.*?(?="?\s+\w+=|(?:"?)$))

Here is DEMO


EDIT

As per edit of the post

simply add \s to check for white spaces as well.

(\w+)\s*=\s*(?:")?(.*?(?="?\s+\w+\s*=|(?:"?)$))

DEMO




回答2:


You can modify your regex to the following:

/\b(\w+)\s*=\s*(?:"([^"]*)"|([^ ]*)\b)/

Notable changes:

  • You can use \w+ in java to capture word characters [A-Za-z0-9_].
  • You do not need to wrap = in a non-capturing group (?:=).
  • The alternation is now wrapped in a non-capturing group.
  • The match should only end with a word boundary when not finished by ".

Please see the following code:

{
    String input = "a =\"First Field\" b=SecondField c3= \"Third field value\" delta = \"\" e_value  = five!";
    String regex = "\\b(\\w+)\\s*=\\s*(?:\"([^\"]*)\"|([^ ]*)\\b)";
    Matcher matcher = Pattern.compile(regex).matcher(input);
    while (matcher.find())
        System.out.println(matcher.group(1) + " = " +
        (matcher.group(2) == null ? matcher.group(3) : matcher.group(2)));
}

View a regex demo and a code demo!

Code demo STDOUT:

a = First Field
b = SecondField
c3 = Third field value
delta = 
e_value = five



回答3:


Your java regex "\b(a|b)\s*(?:=)\s*("[^"]"|[^ ])\b" will produce the output:

a = "First
b = SecondField

It's due to after'"' is not a \b boundary. therefore, your first name/value pair with quotaiton will never be matched.
You could change it a bit like this:

"\b(a|b)\s*=\s*(?:"([^"]*)"|([^ ]*))"

The whole sample code is listed as below:

String input = "a=\"First Field\" b=SecondField";
String regex = "\\b(a|b)\\s*=\\s*(?:\"([^\"]*)\"|([^ ]*))";
Matcher matcher = Pattern.compile(regex).matcher(input);
while (matcher.find()) {
    if(matcher.group(2) != null) {
        System.out.println(matcher.group(1) + " = " + matcher.group(2));
    }else {
        System.out.println(matcher.group(1) + " = " + matcher.group(3));
    }
}

The output is like:

a = First Field
b = SecondField

Meanwhile, if your key is not just 'a or b', it's a workd, then you could chang (a|b) to (\w+)




回答4:


    (a|b)\s*(?:=)\s*("[^"]*"|[^ ]*)

Tried with this.Works fine. http://regex101.com/r/zR7cW9/1



来源:https://stackoverflow.com/questions/25134348/java-regex-to-extract-fields-with-or-without-quotes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!