Sanitising strings in R

两盒软妹~` 提交于 2020-01-02 03:37:08

问题


This is related to a previous question, here: Converting a \u escaped Unicode string to ASCII

I proposed a solution involving eval(parse(text=x)), which for non-R users, means what it says: parsing the text string, then evaluating it. The aim was not to allow arbitrary code to be executed, but only to un-escape escaped Unicode text. Hence the solution:

eval(parse(text=paste0("'", x, "'")))

While this should be fairly safe given the restricted objective, I'd be interested to know: how much sanitisation is required to keep things safe?

At a minimum, I guess any embedded single and double quotes have to be escaped. For example, suppose we have

x <- "this is a '; print(dir()); 'string"

Then eval'ing this per the snippet above would execute the code in the middle. So we have to escape the quotes:

eval(parse(text=paste0("'",
                       gsub("'", "\\\\'", x),
                       "'")))

And similarly for double quotes. I don't think the unescaped Unicode equivalents \u0022 and \u0027 are a problem, since to the parser they'll be identical to plain " and '.

Are there any holes in this approach that I've missed?


回答1:


this is a \'; print(dir()); 'string

is escaped to:

'this is a \\'; print(dir()); 'string'

double-backslash is evaled as literal backslash, quote is active, code is executed.

Also I don't know about R but probably you could at minimum cause a crash using raw control characters like newline or invalid escapes.

eval is a mug's game in general. Normal string handling (search string for the sequence you want, replacing it) is the better approach, and using an existing library for a particular properly-specified format is best of all. For example if you have JSON, use a JSON parser. There are many possible string literal formats that use \u escapes, all with slightly different rules, so you will want to choose the exact format correctly.




回答2:


There is shQuote function which could work for you:

eval(parse(text=shQuote(x)))
# [1] "this is a '; print(dir()); 'string"


来源:https://stackoverflow.com/questions/17770093/sanitising-strings-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!