How do I write a function in r to do cacluations on a record?

后端 未结 2 1163
失恋的感觉
失恋的感觉 2021-01-21 14:42

In C# I am used to the concept of a data set and a current record. It would be easy for me to write a complicated calc-price function with conditions on the current record.

相关标签:
2条回答
  • 2021-01-21 15:09

    Vectorization is one of the most fundamental (and unusual) things you'll need to get used to in R. Many (most?) R operations are vectorized. But a few things aren't - and if(){}else{} is one of the non-vectorized things. It's used for control flow (whether or not to run a code block) not for vector operations. ifelse() is a separate function that is used for vectors, where the first argument is a "test", and the 2nd and 3rd arguments are the "if yes" and "if no" results. The test is a vector, and the returned value is the appropriate yes/no result for each item in test. The result will be the same length as the test.

    So we would write your IsPretty function like this:

    IsPretty <- function(PetalWidth){
      return(ifelse(PetalWidth > 0.3, "Y", "N"))
    }
    
    df <- iris
    df$Pretty = IsPretty(df$Petal.Width)
    

    Contrast to an if(){...}else{...} block where the test condition is of length one, and arbitrary code can be run in the ... - may return a bigger result than the test, or a smaller result, or no result - might modify other objects... You can do anything inside if(){}else(), but the test condition must have length 1.

    You could use your IsPretty function one row at a time - it will work fine for any one row. So we could put it in a loop as below, checking one row at time, giving if() one test at a time, assigning results one at a time. But R is optimized for vectorization, and this will be noticeably slower and is a bad habit.

    IsPrettyIf <-function(PetalWidth){
      if (PetalWidth  >0.3) return("Y")
      return("N")
    }
    
    for(i in 1:nrow(df)) {
      df$PrettyLoop[i] = IsPrettyIf(df$Petal.Width[i])
    }
    

    A benchmark below shows that the vectorized version is 50x faster. This is such a simple case and such small data that it doesn't much matter, but on larger data, or with more complex operations the difference between vectorized and non-vectorized code can be minutes vs days.

    microbenchmark::microbenchmark(
      loop = {
        for(i in 1:nrow(df)) {
          df$PrettyLoop[i] = IsPrettyIf(df$Petal.Width[i])
        }
      },
      vectorized = {
        df$Pretty = IsPretty(df$Petal.Width)    
      }
    )
    Unit: microseconds
           expr    min     lq     mean median      uq     max neval
           loop 3898.9 4365.6 5880.623 5442.3 7041.10 11344.6   100
     vectorized   47.7   59.6  112.288   67.4   83.85  1819.4   100
    

    This is a common bump for R learners - you can find many questions on Stack Overflow where people are using if(){}else{} when they need ifelse() or vice versa. Why can't ifelse return vectors? is a FAQ coming from the opposite side of the problem.


    What goes on in your attempt?

    df <- iris
    
    ## The condition has length equal to the number of rows in the data frame
    df$Petal.Width > 0.3
    #>   [1] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
    #>  [13] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE
    ## ... truncated
    
    
    ## R warns us that only the first value (which happens to be FALSE) is used
    result = if(df$Petal.Width > 0.3) {"Y"} else {"N"}
    #> Warning in if (df$Petal.Width > 0.3) {: the condition has length > 1 and only
    #> the first element will be used
    
    ## So the result is a single "N"
    result  
    #> [1] "N"
    
    length(result)
    #> [1] 1
    
    
    ## R "recycles" inputs that are of insufficient length
    ## so we get a full column of "N"
    df$Pretty = result
    head(df)
    #>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Pretty
    #> 1          5.1         3.5          1.4         0.2  setosa      N
    #> 2          4.9         3.0          1.4         0.2  setosa      N
    #> 3          4.7         3.2          1.3         0.2  setosa      N
    #> 4          4.6         3.1          1.5         0.2  setosa      N
    #> 5          5.0         3.6          1.4         0.2  setosa      N
    #> 6          5.4         3.9          1.7         0.4  setosa      N
    

    Created on 2020-11-08 by the reprex package (v0.3.0)

    0 讨论(0)
  • 2021-01-21 15:30

    For my own notes on Gregor's answer

    IsPrettyIf <-function(row){
     ret ="N"  
     if(row$Petal.Width > 0.3) { ret="Y"}
     return(ret)
    }
    
     
    df <- iris
    df$PrettyLoop ="" # add a column and initialize all the cells to be empty
    for(i in 1:5) {
      df$PrettyLoop[i] = IsPrettyIf(df[i,])
      cat("Row",i, "is Pretty?",df$PrettyLoop[i],"\n")
    }
    

    The bit that trips me up is that row$PrettyLoop is like a cell and df$PrettyLoop is like a column, thinking with the spreadsheet analogy.

    0 讨论(0)
提交回复
热议问题