I\'m trying to come up with a regular expression to remove comments from an SQL statement.
This regex almost works:
(/\\*([^*]|[\\r\\n]|(\\*+([^*/]|
Originally, I used @Adrien Gibrat's solution. However, I came across a situation where it wasn't parsing quoted strings, properly, if I had anything with a preceding '--' inside of them. I ended up writing this, instead:
'[^']*(?!\\)'(*SKIP)(*F) # Make sure we're not matching inside of quotes
|(?m-s:\s*(?:\-{2}|\#)[^\n]*$) # Single line comment
|(?:
\/\*.*?\*\/ # Multi-line comment
(?(?=(?m-s:\h+$)) # Get trailing whitespace if any exists and only if it's the rest of the line
\h+
)
)
# Modifiers used: 'xs' ('g' can be used as well, but is enabled by default in PHP)
Please note that this should be used when PCRE is available. So, in my case, I'm using a variation of this in my PHP library.
Example
This code works for me:
function strip_sqlcomment ($string = '') {
$RXSQLComments = '@(--[^\r\n]*)|(\#[^\r\n]*)|(/\*[\w\W]*?(?=\*/)\*/)@ms';
return (($string == '') ? '' : preg_replace( $RXSQLComments, '', $string ));
}
with a little regex tweak it could be used to strip comments in any language
remove /**/ and -- comments
function unComment($sql){
$re = '/(--[^\n]*)/i';
$sql = preg_replace( $re, '', $sql );
$sqlComments = '@(([\'"]).*?[^\\\]\2)|((?:\#|--).*?$|/\*(?:[^/*]|/(?!\*)|\*(?!/)|(?R))*\*\/)\s*|(?<=;)\s+@ms';
$uncommentedSQL = trim( preg_replace( $sqlComments, '$1', $sql ) );
preg_match_all( $sqlComments, $sql, $comments );
$sql = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', trim($uncommentedSQL));
return $sql;
}
As you said that the rest of your regex is fine, I focused on the last part. All you need to do is verify that the --
is at the beginning and then make sure it removes all dashes if there are more than 2. The end regex is below
(^[--]+)
The above is just if you want to remove the comment dashes and not the whole line. You can run the below if you do want everything after it to the end of the line, also
(^--.*)
For all PHP folks: please use this library - https://github.com/jdorn/sql-formatter. I have been dealing with stripping comments from SQL for couple years now and the only valid solution would be a tokenizer/state machine, which I lazily resisted to write. Couple days ago I found out this lib and ran 120k queries through it and found only one bug (https://github.com/jdorn/sql-formatter/issues/93), which is fixed immediately in our fork https://github.com/keboola/sql-formatter.
The usage is simple
$query <<<EOF
/*
my comments
*/
SELECT 1;
EOF;
$bareQuery = \SqlFormatter::removeComments($query);
// prints "SELECT 1;"
print $bareQuery;
In PHP, i'm using this code to uncomment SQL:
$sqlComments = '@(([\'"]).*?[^\\\]\2)|((?:\#|--).*?$|/\*(?:[^/*]|/(?!\*)|\*(?!/)|(?R))*\*\/)\s*|(?<=;)\s+@ms';
/* Commented version
$sqlComments = '@
(([\'"]).*?[^\\\]\2) # $1 : Skip single & double quoted expressions
|( # $3 : Match comments
(?:\#|--).*?$ # - Single line comments
| # - Multi line (nested) comments
/\* # . comment open marker
(?: [^/*] # . non comment-marker characters
|/(?!\*) # . ! not a comment open
|\*(?!/) # . ! not a comment close
|(?R) # . recursive case
)* # . repeat eventually
\*\/ # . comment close marker
)\s* # Trim after comments
|(?<=;)\s+ # Trim after semi-colon
@msx';
*/
$uncommentedSQL = trim( preg_replace( $sqlComments, '$1', $sql ) );
preg_match_all( $sqlComments, $sql, $comments );
$extractedComments = array_filter( $comments[ 3 ] );
var_dump( $uncommentedSQL, $extractedComments );