R variable names in loop, get, etc

前端 未结 1 1886
忘掉有多难
忘掉有多难 2021-01-26 15:00

Still relatively new to R. Trying to have dynamic variables in a loop but running into all sorts of problems. Initial code looks something like this (but bigger)



        
相关标签:
1条回答
  • 2021-01-26 15:28

    The problem you have relates to the way data frames and most other objects are treated in R. In many programming languages, objects are (or at least can be) passed to functions by reference. In C++ if I pass a pointer to an object to a function which manipulates that object, the original is modified. This is not the way things work for the most part in R.

    When an object is created like this:

    x <- list(a = 5, b = 9)
    

    And then copied like this:

    y <- x
    

    Initially y and x will point to the same object in RAM. But as soon as y is modified at all, a copy is created. So assigning y$c <- 12 has no effect on x.

    get() doesn't return the named object in a way that can be modified without first assigning it to another variable (which would mean the original variable is left unaltered).

    The correct way of doing this in R is storing your data frames in a named list. You can then loop through the list and use the replacement syntax to change the columns.

    datalist <- list(data.train = data.train, data.test = data.test)
    for (df in names(datalist)){
      datalist[[df]]$Pclass_F <- as.factor(datalist[[df]]$Pclass_F)
    }
    

    You could also use:

    datalist <- setNames(lapply(list(data.train, data.test), function(data) {
      data$Pclass_Fb <- as.factor(data$Pclass_Fb)
      data
    }), c("data.train", "data.test"))
    

    This is using lapply to process each member of the list, returning a new list with the modified columns.

    In theory, you could achieve what you were originally trying to do by using the [[ operator on the global environment, but it would be an unconventional way of doing things and may lead to confusion later on.

    0 讨论(0)
提交回复
热议问题