R - Checking if a string is a valid mathematical expression using non-standard evaluation

问题

I would like to check if the strings below are valid mathematical expressions:

s1 = 'sin(x)'
s2 = 'sin(x*m)'
s3 = 'sin'
s4 = 'sin(xm)'

By 'valid', I mean the expression is a combination of

operators (must be used in conjunction with variables or constants)
variables x and/or m
constants.

By this definition s1 and s2 are valid while s3 and s4 are not.

To identify if a string is valid, I wrote a function checkFxn that first attempts to convert the string into a call or one of its parts. If successful, it then recurses through the call-tree and checks for the above conditions. If the conditions are satisfied, then the call is returned as-is. If not, an error is thrown.

checkFxn <- function(x) {

  lang <- str2lang(x)

  checkFxn2 <- function(y) {

    if(is.name(y)) {

      stopifnot(deparse(y) %in% c('x', 'm'))

    } else if(is.call(y)) {

      stopifnot(is.function(eval(y[[1]])) | is.primitive(eval(y[[1]])))

      lapply(y[-1], checkFxn2)

    } else {

      stopifnot(is.logical(y) | is.numeric(y) | is.complex(y))

    }

    return(y)

  }

  checkFxn2(lang)

}


#Applying checkFxn to s1-4
lapply(list(s1,s2,s3,s4), function(x) {try(checkFxn(x), silent = T)})
[[1]]
sin(x)

[[2]]
sin(x * m)

[[3]]
[1] "Error in checkFxn2(lang) : deparse(y) %in% c(\"x\", \"m\") is not TRUE\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in checkFxn2(lang): deparse(y) %in% c("x", "m") is not TRUE>

[[4]]
[1] "Error in FUN(X[[i]], ...) : deparse(y) %in% c(\"x\", \"m\") is not TRUE\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in FUN(X[[i]], ...): deparse(y) %in% c("x", "m") is not TRUE>

It seems to work as expected but I'm wary of my use of eval and was wondering if someone could suggest an alternative to using it? I know that it follows the usual lexical scoping rules, so I'm worried about it evaluating variables in the gobal environment - is there a way to restrict its scope? I've read the chapter on non-standard evaluation but I can't figure it out.

Also, is there a way to identify if a base function or primitive is a mathematical operator? I would like to use something more specific than is.function and is.primitive.

回答1:

Step 1: Decide what constitutes a "mathematical operator". One option is to retrieve relevant groups from the S4 generics. For example,

mathOps <- unlist(lapply( c("Arith","Compare","Math"), getGroupMembers ))
#  [1] "+"        "-"        "*"        "^"        "%%"       "%/%"     
#  [7] "/"        "=="       ">"        "<"        "!="       "<="      
# [13] ">="       "abs"      "sign"     "sqrt"     "ceiling"  "floor"   
# [19] "trunc"    "cummax"   "cummin"   "cumprod"  "cumsum"   "exp"     
# [25] "expm1"    "log"      "log10"    "log2"     "log1p"    "cos"     
# [31] "cosh"     "sin"      "sinh"     "tan"      "tanh"     "acos"    
# [37] "acosh"    "asin"     "asinh"    "atan"     "atanh"    "cospi"   
# [43] "sinpi"    "tanpi"    "gamma"    "lgamma"   "digamma"  "trigamma"

Step 2: Decompose your expressions into abstract syntax trees.

getAST <- function( ee ) 
    lapply( as.list(ee), function(x) `if`(is.call(x), getAST(x), x) )

# Example usage
getAST( quote(sin(x+5)) )
# [[1]]
# sin
# 
# [[2]]
# [[2]][[1]]
# `+`
# 
# [[2]][[2]]
# x
# 
# [[2]][[3]]
# [1] 5

Step 3: Traverse the ASTs based on your definition of "validity"

checkFxn <- function( ast, validOps )
{
  ## Terminal nodes of an AST will not be lists
  ## Wrap them into a list of length 1 to keep the recursion flow
  if( !is.list(ast) ) ast <- list(ast)

  ## Operators must be called with one or more arguments
  if( as.character(ast[[1]]) %in% validOps )
    return( `if`(length(ast) < 2, FALSE,
                 all(sapply(ast[-1], checkFxn, validOps))) )

  ## Variables x and m are OK
  if( identical(ast[[1]], quote(x)) || identical(ast[[1]], quote(m)) )
    return(TRUE)

  ## Constants are OK
  if( is.numeric(ast[[1]]) ) return(TRUE)

  ## Everything else is invalid
  FALSE
}

Putting it all together

exprs <- lapply( list(s1,s2,s3,s4), str2lang )   # Convert strings to expressions
asts <- lapply( exprs, getAST )                  # Build ASTs
sapply( asts, checkFxn, mathOps )                # Evaluate validity
# [1]  TRUE  TRUE FALSE FALSE

Alternative to ASTs

As pointed out by @Moody_Mudskipper, one can also use all.names to retrieve the list of symbols occurring inside an arbitrary expression. While this doesn't preserve the relative structure of those symbols, the names can be compared directly against mathOps.

来源：https://stackoverflow.com/questions/58654348/r-checking-if-a-string-is-a-valid-mathematical-expression-using-non-standard-e

标签

eval

lexical-scope

non-standard-evaluation