Regex to remove single-line SQL comments (--)

前端 未结 7 990
悲哀的现实
悲哀的现实 2021-01-15 08:52

Question:

Can anybody give me a working regex expression (C#/VB.NET) that can remove single line comments from a SQL statement ?

I mean these comments:

7条回答
  •  太阳男子
    2021-01-15 09:15

    I will disappoint all of you. This can't be done with regular expressions. Sure, it's easy to find comments not in a string (that even the OP could do), the real deal is comments in a string. There is a little hope of the look arounds, but that's still not enough. By telling that you have a preceding quote in a line won't guarantee anything. The only thing what guarantees you something is the oddity of quotes. Something you can't find with regular expression. So just simply go with non-regular-expression approach.

    EDIT: Here's the c# code:

            String sql = "--this is a test\r\nselect stuff where substaff like '--this comment should stay' --this should be removed\r\n";
            char[] quotes = { '\'', '"'};
            int newCommentLiteral, lastCommentLiteral = 0;
            while ((newCommentLiteral = sql.IndexOf("--", lastCommentLiteral)) != -1)
            {
                int countQuotes = sql.Substring(lastCommentLiteral, newCommentLiteral - lastCommentLiteral).Split(quotes).Length - 1;
                if (countQuotes % 2 == 0) //this is a comment, since there's an even number of quotes preceding
                {
                    int eol = sql.IndexOf("\r\n") + 2;
                    if (eol == -1)
                        eol = sql.Length; //no more newline, meaning end of the string
                    sql = sql.Remove(newCommentLiteral, eol - newCommentLiteral);
                    lastCommentLiteral = newCommentLiteral;
                }
                else //this is within a string, find string ending and moving to it
                {
                    int singleQuote = sql.IndexOf("'", newCommentLiteral);
                    if (singleQuote == -1)
                        singleQuote = sql.Length;
                    int doubleQuote = sql.IndexOf('"', newCommentLiteral);
                    if (doubleQuote == -1)
                        doubleQuote = sql.Length;
    
                    lastCommentLiteral = Math.Min(singleQuote, doubleQuote) + 1;
    
                    //instead of finding the end of the string you could simply do += 2 but the program will become slightly slower
                }
            }
    
            Console.WriteLine(sql);
    

    What this does: find every comment literal. For each, check if it's within a comment or not, by counting the number of quotes between the current match and the last one. If this number is even, then it's a comment, thus remove it (find first end of line and remove whats between). If it's odd, this is within a string, find the end of the string and move to it. Rgis snippet is based on a wierd SQL trick: 'this" is a valid string. Even tho the 2 quotes differ. If it's not true for your SQL language, you should try a completely different approach. I'll write a program to that too if that's the case, but this one's faster and more straightforward.

提交回复
热议问题