RegEx split string with on a delimeter(semi-colon ;) except those that appear inside a string

守給你的承諾、 提交于 2019-12-06 09:51:52

The regular expression pattern ((?:(?:'[^']*')|[^;])*); should give you what you need. Use a while loop and Matcher.find() to extract all the SQL statements. Something like:

Pattern p = Pattern.compile("((?:(?:'[^']*')|[^;])*);";);
Matcher m = p.matcher(s);
int cnt = 0;
while (m.find()) {
    System.out.println(++cnt + ": " + m.group(1));
}

Using the sample SQL you provided, will output:

1: CREATE OR REPLACE PROCEDURE Proc
   AS
        b NUMBER:=3
2: 
        c VARCHAR2(2000)
3: 
    begin
        c := 'BEGIN ' || ' :1 := :1 + :2; ' || 'END;'
4: 
   end Proc

If you want to get the terminating ;, use m.group(0) instead of m.group(1).

For more information on regular expressions, see the Pattern JavaDoc and this great reference. Here's a synopsis of the pattern:

(              Start capturing group
  (?:          Start non-capturing group
    (?:        Start non-capturing group
      '        Match the literal character '
      [^']     Match a single character that is not '
      *        Greedily match the previous atom zero or more times
      '        Match the literal character '
    )          End non-capturing group
    |          Match either the previous or the next atom
    [^;]       Match a single character that is not ;
  )            End non-capturing group
  *            Greedily match the previous atom zero or more times
)              End capturing group
;              Match the literal character ;

What you might try is just splitting on ";". Then for each string, if it has an odd number of 's, concatenate it with the following string until it has an even number of 's adding the ";"s back in.

I was having the same issue. I saw previous recommendations and decided to improve handling for:

  • Comments
  • Escaped single quotes
  • Single querys not ended by semicolon

My solution is written for java. Some things as backslash ecaping and DOTALL mode may change from one language to another one.

this worked for me "(?s)\s*((?:'(?:\\.|[^\\']|''|)'|/\.*?\*/|(?:--|#)[^\r\n]|[^\\'])?)(?:;|$)"

"
(?s)                 DOTALL mode. Means the dot includes \r\n
\\s*                 Initial whitespace
(
    (?:              Grouping content of a valid query
        '            Open string literal
        (?:          Grouping content of a string literal expression
            \\\\.    Any escaped character. Doesn't matter if it's a single quote
        |
            [^\\\\'] Any character which isn't escaped. Escaping is covered above.
        |
            ''       Escaped single quote
        )            Any of these regexps are valid in a string literal.
        *            The string can be empty 
        '            Close string literal
    |
        /\\*         C-style comment start
        .*?          Any characters, but as few as possible (doesn't include */)
        \\*/         C-style comment end
    |
        (?:--|#)     SQL comment start
        [^\r\n]*     One line comment which ends with a newline
    |
        [^\\\\']     Anything which doesn't have to do with a string literal
    )                Theses four tokens basically define the contents of a query
    *?               Avoid greediness of above tokens to match the end of a query
)
(?:;|$)              After a series of query tokens, find ; or EOT
"

As for your second case, please notice the last part of the regexp expresses how your regular expression will be ended. Right now it only accepts semicolon or end of text. However, you can add whatever you want to the ending. For example (?:;|@|/|$) accepts at and slash as ending characters. Haven't tested this solution for you, but shouldn't be hard.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!