How to force an error if non-finite values (NA, NaN, or Inf) are encountered

后端 未结 2 717
南笙
南笙 2021-01-02 03:27

There\'s a conditional debugging flag I miss from Matlab: dbstop if infnan described here. If set, this condition will stop code execution when an Inf

相关标签:
2条回答
  • 2021-01-02 03:49

    I fear there is no such shortcut. In theory on unix there is SIGFPE that you could trap on, but in practice

    1. there is no standard way to enable FP operations to trap it (even C99 doesn't include a provision for that) - it is highly system-specifc (e.g. feenableexcept on Linux, fp_enable_all on AIX etc.) or requires the use of assembler for your target CPU
    2. FP operations are nowadays often done in vector units like SSE so you can't be even sure that FPU is involved and
    3. R intercepts some operations on things like NaNs, NAs and handles them separately so they won't make it to the FP code

    That said, you could hack yourself an R that will catch some exceptions for your platform and CPU if you tried hard enough (disable SSE etc.). It is not something we would consider building into R, but for a special purpose it may be doable.

    However, it would still not catch NaN/NA operations unless you change R internal code. In addition, you would have to check every single package you are using since they may be using FP operations in their C code and may also handle NA/NaN separately.

    If you are only worried about things like division by zero or over/underflows, the above will work and is probably the closest to something like a solution.

    Just checking your results may not be very reliable, because you don't know whether a result is based on some intermediate NaN calculation that changed an aggregated value which may not need to be NaN as well. If you are willing to discard such case, then you could simply walk recursively through your result objects or the workspace. That should not be extremely inefficient, because you only need to worry about REALSXP and not anything else (unless you don't like NAs either - then you'd have more work).


    This is an example code that could be used to traverse R object recursively:

    static int do_isFinite(SEXP x) {
        /* recurse into generic vectors (lists) */
        if (TYPEOF(x) == VECSXP) {
            int n = LENGTH(x);
            for (int i = 0; i < n; i++)
                if (!do_isFinite(VECTOR_ELT(x, i))) return 0;
        }
        /* recurse into pairlists */ 
        if (TYPEOF(x) == LISTSXP) {
             while (x != R_NilValue) {
                 if (!do_isFinite(CAR(x))) return 0;
                 x = CDR(x);
             }
             return 1;
        }
        /* I wouldn't bother with attributes except for S4
           where attributes are slots */
        if (IS_S4_OBJECT(x) && !do_isFinite(ATTRIB(x))) return 0;
        /* check reals */
        if (TYPEOF(x) == REALSXP) {
            int n = LENGTH(x);
            double *d = REAL(x);
            for (int i = 0; i < n; i++) if (!R_finite(d[i])) return 0;
        }
        return 1; 
    }
    
    SEXP isFinite(SEXP x) { return ScalarLogical(do_isFinite(x)); }
    
    # in R: .Call("isFinite", x)
    
    0 讨论(0)
  • 2021-01-02 04:06

    The idea sketched below (and its implementation) is very imperfect. I'm hesitant to even suggest it, but: (a) I think it's kind of interesting, even in all of its ugliness; and (b) I can think of situations where it would be useful. Given that it sounds like you are right now manually inserting a check after each computation, I'm hopeful that your situation is one of those.

    Mine is a two-step hack. First, I define a function nanDetector() which is designed to detect NaNs in several of the object types that might be returned by your calculations. Then, it using addTaskCallback() to call the function nanDetector() on .Last.value after each top-level task/calculation is completed. When it finds an NaN in one of those returned values, it throws an error, which you can use to avoid any further computations.

    Among its shortcomings:

    • If you do something like setting stop(error = recover), it's hard to tell where the error was triggered, since the error is always thrown from inside of stopOnNaNs().

    • When it throws an error, stopOnNaNs() is terminated before it can return TRUE. As a consequence, it is removed from the task list, and you'll need to reset with addTaskCallback(stopOnNaNs) it you want to use it again. (See the 'Arguments' section of ?addTaskCallback for more details).

    Without further ado, here it is:


    # Sketch of a function that tests for NaNs in several types of objects
    nanDetector <- function(X) {
       # To examine data frames
       if(is.data.frame(X)) { 
           return(any(unlist(sapply(X, is.nan))))
       }
       # To examine vectors, matrices, or arrays
       if(is.numeric(X)) {
           return(any(is.nan(X)))
       }
       # To examine lists, including nested lists
       if(is.list(X)) {
           return(any(rapply(X, is.nan)))
       }
       return(FALSE)
    }
    
    # Set up the taskCallback
    stopOnNaNs <- function(...) {
        if(nanDetector(.Last.value)) {stop("NaNs detected!\n")}
        return(TRUE)
    }
    addTaskCallback(stopOnNaNs)
    
    
    # Try it out
    j <- 1:00
    y <- rnorm(99)
    l <- list(a=1:4, b=list(j=1:4, k=NaN))
    # Error in function (...)  : NaNs detected!
    
    # Subsequent time consuming code that could be avoided if the
    # error thrown above is used to stop its evaluation.
    
    0 讨论(0)
提交回复
热议问题