问题
I would like to check if the strings below are valid mathematical expressions:
s1 = 'sin(x)'
s2 = 'sin(x*m)'
s3 = 'sin'
s4 = 'sin(xm)'
By 'valid', I mean the expression is a combination of
- operators (must be used in conjunction with variables or constants)
- variables
x
and/orm
- constants.
By this definition s1
and s2
are valid while s3
and s4
are not.
To identify if a string is valid, I wrote a function checkFxn
that first attempts to convert the string into a call or one of its parts. If successful, it then recurses through the call-tree and checks for the above conditions. If the conditions are satisfied, then the call is returned as-is. If not, an error is thrown.
checkFxn <- function(x) {
lang <- str2lang(x)
checkFxn2 <- function(y) {
if(is.name(y)) {
stopifnot(deparse(y) %in% c('x', 'm'))
} else if(is.call(y)) {
stopifnot(is.function(eval(y[[1]])) | is.primitive(eval(y[[1]])))
lapply(y[-1], checkFxn2)
} else {
stopifnot(is.logical(y) | is.numeric(y) | is.complex(y))
}
return(y)
}
checkFxn2(lang)
}
#Applying checkFxn to s1-4
lapply(list(s1,s2,s3,s4), function(x) {try(checkFxn(x), silent = T)})
[[1]]
sin(x)
[[2]]
sin(x * m)
[[3]]
[1] "Error in checkFxn2(lang) : deparse(y) %in% c(\"x\", \"m\") is not TRUE\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in checkFxn2(lang): deparse(y) %in% c("x", "m") is not TRUE>
[[4]]
[1] "Error in FUN(X[[i]], ...) : deparse(y) %in% c(\"x\", \"m\") is not TRUE\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in FUN(X[[i]], ...): deparse(y) %in% c("x", "m") is not TRUE>
It seems to work as expected but I'm wary of my use of eval
and was wondering if someone could suggest an alternative to using it? I know that it follows the usual lexical scoping rules, so I'm worried about it evaluating variables in the gobal environment - is there a way to restrict its scope? I've read the chapter on non-standard evaluation but I can't figure it out.
Also, is there a way to identify if a base function or primitive is a mathematical operator? I would like to use something more specific than is.function
and is.primitive
.
回答1:
Step 1: Decide what constitutes a "mathematical operator". One option is to retrieve relevant groups from the S4 generics. For example,
mathOps <- unlist(lapply( c("Arith","Compare","Math"), getGroupMembers ))
# [1] "+" "-" "*" "^" "%%" "%/%"
# [7] "/" "==" ">" "<" "!=" "<="
# [13] ">=" "abs" "sign" "sqrt" "ceiling" "floor"
# [19] "trunc" "cummax" "cummin" "cumprod" "cumsum" "exp"
# [25] "expm1" "log" "log10" "log2" "log1p" "cos"
# [31] "cosh" "sin" "sinh" "tan" "tanh" "acos"
# [37] "acosh" "asin" "asinh" "atan" "atanh" "cospi"
# [43] "sinpi" "tanpi" "gamma" "lgamma" "digamma" "trigamma"
Step 2: Decompose your expressions into abstract syntax trees.
getAST <- function( ee )
lapply( as.list(ee), function(x) `if`(is.call(x), getAST(x), x) )
# Example usage
getAST( quote(sin(x+5)) )
# [[1]]
# sin
#
# [[2]]
# [[2]][[1]]
# `+`
#
# [[2]][[2]]
# x
#
# [[2]][[3]]
# [1] 5
Step 3: Traverse the ASTs based on your definition of "validity"
checkFxn <- function( ast, validOps )
{
## Terminal nodes of an AST will not be lists
## Wrap them into a list of length 1 to keep the recursion flow
if( !is.list(ast) ) ast <- list(ast)
## Operators must be called with one or more arguments
if( as.character(ast[[1]]) %in% validOps )
return( `if`(length(ast) < 2, FALSE,
all(sapply(ast[-1], checkFxn, validOps))) )
## Variables x and m are OK
if( identical(ast[[1]], quote(x)) || identical(ast[[1]], quote(m)) )
return(TRUE)
## Constants are OK
if( is.numeric(ast[[1]]) ) return(TRUE)
## Everything else is invalid
FALSE
}
Putting it all together
exprs <- lapply( list(s1,s2,s3,s4), str2lang ) # Convert strings to expressions
asts <- lapply( exprs, getAST ) # Build ASTs
sapply( asts, checkFxn, mathOps ) # Evaluate validity
# [1] TRUE TRUE FALSE FALSE
Alternative to ASTs
As pointed out by @Moody_Mudskipper, one can also use all.names
to retrieve the list of symbols occurring inside an arbitrary expression. While this doesn't preserve the relative structure of those symbols, the names can be compared directly against mathOps
.
来源:https://stackoverflow.com/questions/58654348/r-checking-if-a-string-is-a-valid-mathematical-expression-using-non-standard-e