问题
This is related to a previous question, here: Converting a \u escaped Unicode string to ASCII
I proposed a solution involving eval(parse(text=x))
, which for non-R users, means what it says: parsing the text string, then evaluating it. The aim was not to allow arbitrary code to be executed, but only to un-escape escaped Unicode text. Hence the solution:
eval(parse(text=paste0("'", x, "'")))
While this should be fairly safe given the restricted objective, I'd be interested to know: how much sanitisation is required to keep things safe?
At a minimum, I guess any embedded single and double quotes have to be escaped. For example, suppose we have
x <- "this is a '; print(dir()); 'string"
Then eval
'ing this per the snippet above would execute the code in the middle. So we have to escape the quotes:
eval(parse(text=paste0("'",
gsub("'", "\\\\'", x),
"'")))
And similarly for double quotes. I don't think the unescaped Unicode equivalents \u0022
and \u0027
are a problem, since to the parser they'll be identical to plain "
and '
.
Are there any holes in this approach that I've missed?
回答1:
this is a \'; print(dir()); 'string
is escaped to:
'this is a \\'; print(dir()); 'string'
double-backslash is evaled as literal backslash, quote is active, code is executed.
Also I don't know about R but probably you could at minimum cause a crash using raw control characters like newline or invalid escapes.
eval
is a mug's game in general. Normal string handling (search string for the sequence you want, replacing it) is the better approach, and using an existing library for a particular properly-specified format is best of all. For example if you have JSON, use a JSON parser. There are many possible string literal formats that use \u
escapes, all with slightly different rules, so you will want to choose the exact format correctly.
回答2:
There is shQuote function which could work for you:
eval(parse(text=shQuote(x)))
# [1] "this is a '; print(dir()); 'string"
来源:https://stackoverflow.com/questions/17770093/sanitising-strings-in-r