refer to range of columns by name in R

后端 未结 6 1683
暗喜
暗喜 2020-12-17 04:00

I need help with something that might be fairly simple in R. I want to refer to a range of columns in a data frame (e.g., extracting a few select variables). However, I don\

相关标签:
6条回答
  • 2020-12-17 04:33

    Use %in% in combination with names(). It's useful for grabbing a group of columns from a data frame. You can negate the expression when you want to keep just a subset and drop the rest. Type ?"%in%" at the R Console prompt for more details.

    set.seed(1234)
    mydf <- data.frame(A = runif(5, 1, 2),
                       B = runif(5, 3, 4),
                       C = runif(5, 5, 6),
                       D = runif(5, 7, 8),
                       E = runif(5, 9, 10))
    mydf
    
    keep.cols <- c('A','D','E')
    mydf[, names(mydf) %in% keep.cols]
    drop.cols <- c('A','B','C')
    mydf[, !names(mydf) %in% drop.cols]
    

    The data frame:

    > mydf
             A        B        C        D        E
    1 1.113703 3.640311 5.693591 7.837296 9.316612
    2 1.622299 3.009496 5.544975 7.286223 9.302693
    3 1.609275 3.232551 5.282734 7.266821 9.159046
    4 1.623379 3.666084 5.923433 7.186723 9.039996
    5 1.860915 3.514251 5.292316 7.232226 9.218800
    

    A subset of columns:

    > mydf[, names(mydf) %in% keep.cols]
             A        D        E
    1 1.113703 7.837296 9.316612
    2 1.622299 7.286223 9.302693
    3 1.609275 7.266821 9.159046
    4 1.623379 7.186723 9.039996
    5 1.860915 7.232226 9.218800
    

    Keeping a subset of columns and dropping the rest:

    > mydf[, !names(mydf) %in% drop.cols]
             D        E
    1 7.837296 9.316612
    2 7.286223 9.302693
    3 7.266821 9.159046
    4 7.186723 9.039996
    5 7.232226 9.218800
    
    0 讨论(0)
  • 2020-12-17 04:36

    You can call the column numbers by their names:

    set.seed(1234)
    > mydf <- data.frame(A = runif(5, 1, 2),
                         + B = runif(5, 3, 4),
                         + C = runif(5, 5, 6),
                         + D = runif(5, 7, 8),
                         + E = runif(5, 9, 10))
    > mydf
    mydf[c(match("A", names(mydf)):match("B", names(mydf)))]
             A        B
    1 1.113703 3.640311
    2 1.622299 3.009496
    3 1.609275 3.232551
    4 1.623379 3.666084
    5 1.860915 3.514251
    

    Here you can see that the match()-call actually gives the column number:

    > c(match("A", names(mydf)):match("B", names(mydf)))
    [1] 1 2
    

    I hope this is also helpful, it is similar to Neal's answer.

    0 讨论(0)
  • 2020-12-17 04:37

    I think I figured it out, but it's a bit ornery. Here's an example using mtcars to get the columns between hp and vs. do.call usually means there is a simpler way, though.

    mtcars[do.call(seq, as.list(match(c("hp", "vs"), colnames(mtcars))))]
    
    0 讨论(0)
  • 2020-12-17 04:47

    Getting a range of columns can be done in several ways. subset(data.frame, select = name4:name10), works but is quite long. I used that before I got annoyed writing long commands for a simple thing. I made a function to tackle the naming columns / not remembering column numbers in large data frames:

    coln <- function(X){
      y <- rbind(seq(1,ncol(X)))
      colnames(y) <- colnames(X)
    rownames(y) <- "col.number"
      return(y)} 
    

    Here is how it works:

    df <- data.frame(a = 1:10, b =10:1, c = 1:10)
    coln(df)
               a b c
    col.number 1 2 3
    

    Now you can call them with numbers and still look at names.

    0 讨论(0)
  • 2020-12-17 04:52

    Here is a fun little function that combines the ideas behind Largh's answer with a handy function call. To use it, just enter

    call.cols(mydata, "firstvarname", "lastvarname")

    call.cols <- function(df, startvar, endvar) {
      col.num <- function(df){
        var.nums <- seq(1,ncol(df))
        names(var.nums) <- colnames(df)      
        return(var.nums)
      } 
    
     start.num <- as.numeric(col.num(df)[startvar])
     end.num <- as.numeric(col.num(df)[endvar])
     range.num <- start.num:end.num
     return(df[range.num]) 
    }
    

    I plan to expand this to use for scale creation for psychometric research.

    0 讨论(0)
  • 2020-12-17 04:56

    A column number can be identified from a column name within a data frame as follows:

    which(colnames(mydf)=="a")
    

    where mydf is a data frame and a is the name of the column the column number is required for.

    (Source)

    This can be used to create a column range:

    firstcol = which(colnames(x)=="a")
    lastcol = which(colnames(x)=="b")
    
    mydf[c(firstcol:lastcol)]
    
    0 讨论(0)
提交回复
热议问题