Regex for splitting a string using space when not surrounded by single or double quotes

后端 未结 15 2133
梦毁少年i
梦毁少年i 2020-11-22 03:15

I\'m new to regular expressions and would appreciate your help. I\'m trying to put together an expression that will split the example string using all spaces that are not s

相关标签:
15条回答
  • 2020-11-22 03:37

    The regex from Jan Goyvaerts is the best solution I found so far, but creates also empty (null) matches, which he excludes in his program. These empty matches also appear from regex testers (e.g. rubular.com). If you turn the searches arround (first look for the quoted parts and than the space separed words) then you might do it in once with:

    ("[^"]*"|'[^']*'|[\S]+)+
    
    0 讨论(0)
  • 2020-11-22 03:38

    You can also try this:

        String str = "This is a string that \"will be\" highlighted when your 'regular expression' matches something";
        String ss[] = str.split("\"|\'");
        for (int i = 0; i < ss.length; i++) {
            if ((i % 2) == 0) {//even
                String[] part1 = ss[i].split(" ");
                for (String pp1 : part1) {
                    System.out.println("" + pp1);
                }
            } else {//odd
                System.out.println("" + ss[i]);
            }
        }
    
    0 讨论(0)
  • 2020-11-22 03:43
    (?<!\G".{0,99999})\s|(?<=\G".{0,99999}")\s
    

    This will match the spaces not surrounded by double quotes. I have to use min,max {0,99999} because Java doesn't support * and + in lookbehind.

    0 讨论(0)
  • 2020-11-22 03:43

    String.split() is not helpful here because there is no way to distinguish between spaces within quotes (don't split) and those outside (split). Matcher.lookingAt() is probably what you need:

    String str = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
    str = str + " "; // add trailing space
    int len = str.length();
    Matcher m = Pattern.compile("((\"[^\"]+?\")|('[^']+?')|([^\\s]+?))\\s++").matcher(str);
    
    for (int i = 0; i < len; i++)
    {
        m.region(i, len);
    
        if (m.lookingAt())
        {
            String s = m.group(1);
    
            if ((s.startsWith("\"") && s.endsWith("\"")) ||
                (s.startsWith("'") && s.endsWith("'")))
            {
                s = s.substring(1, s.length() - 1);
            }
    
            System.out.println(i + ": \"" + s + "\"");
            i += (m.group(0).length() - 1);
        }
    }
    

    which produces the following output:

    0: "This"
    5: "is"
    8: "a"
    10: "string"
    17: "that"
    22: "will be"
    32: "highlighted"
    44: "when"
    49: "your"
    54: "regular expression"
    75: "matches"
    83: "something."
    
    0 讨论(0)
  • 2020-11-22 03:43

    If you are using c#, you can use

    string input= "This is a string that \"will be\" highlighted when your 'regular expression' matches <something random>";
    
    List<string> list1 = 
                    Regex.Matches(input, @"(?<match>\w+)|\""(?<match>[\w\s]*)""|'(?<match>[\w\s]*)'|<(?<match>[\w\s]*)>").Cast<Match>().Select(m => m.Groups["match"].Value).ToList();
    
    foreach(var v in list1)
       Console.WriteLine(v);
    

    I have specifically added "|<(?[\w\s]*)>" to highlight that you can specify any char to group phrases. (In this case I am using < > to group.

    Output is :

    This
    is
    a
    string
    that
    will be
    highlighted
    when
    your
    regular expression 
    matches
    something random
    
    0 讨论(0)
  • 2020-11-22 03:44

    Jan's approach is great but here's another one for the record.

    If you actually wanted to split as mentioned in the title, keeping the quotes in "will be" and 'regular expression', then you could use this method which is straight out of Match (or replace) a pattern except in situations s1, s2, s3 etc

    The regex:

    '[^']*'|\"[^\"]*\"|( )
    

    The two left alternations match complete 'quoted strings' and "double-quoted strings". We will ignore these matches. The right side matches and captures spaces to Group 1, and we know they are the right spaces because they were not matched by the expressions on the left. We replace those with SplitHere then split on SplitHere. Again, this is for a true split case where you want "will be", not will be.

    Here is a full working implementation (see the results on the online demo).

    import java.util.*;
    import java.io.*;
    import java.util.regex.*;
    import java.util.List;
    
    class Program {
    public static void main (String[] args) throws java.lang.Exception  {
    
    String subject = "This is a string that \"will be\" highlighted when your 'regular expression' matches something.";
    Pattern regex = Pattern.compile("\'[^']*'|\"[^\"]*\"|( )");
    Matcher m = regex.matcher(subject);
    StringBuffer b= new StringBuffer();
    while (m.find()) {
        if(m.group(1) != null) m.appendReplacement(b, "SplitHere");
        else m.appendReplacement(b, m.group(0));
    }
    m.appendTail(b);
    String replaced = b.toString();
    String[] splits = replaced.split("SplitHere");
    for (String split : splits) System.out.println(split);
    } // end main
    } // end Program
    
    0 讨论(0)
提交回复
热议问题