Drop data frame columns by name

后端 未结 20 2558
花落未央
花落未央 2020-11-22 01:06

I have a number of columns that I would like to remove from a data frame. I know that we can delete them individually using something like:

df$x <- NULL
<         


        
相关标签:
20条回答
  • 2020-11-22 01:35

    I keep thinking there must be a better idiom, but for subtraction of columns by name, I tend to do the following:

    df <- data.frame(a=1:10, b=1:10, c=1:10, d=1:10)
    
    # return everything except a and c
    df <- df[,-match(c("a","c"),names(df))]
    df
    
    0 讨论(0)
  • 2020-11-22 01:35

    Find the index of the columns you want to drop using which. Give these indexes a negative sign (*-1). Then subset on those values, which will remove them from the dataframe. This is an example.

    DF <- data.frame(one=c('a','b'), two=c('c', 'd'), three=c('e', 'f'), four=c('g', 'h'))
    DF
    #  one two three four
    #1   a   d     f    i
    #2   b   e     g    j
    
    DF[which(names(DF) %in% c('two','three')) *-1]
    #  one four
    #1   a    g
    #2   b    h
    
    0 讨论(0)
  • 2020-11-22 01:37

    Dplyr Solution

    I doubt this will get much attention down here, but if you have a list of columns that you want to remove, and you want to do it in a dplyr chain I use one_of() in the select clause:

    Here is a simple, reproducable example:

    undesired <- c('mpg', 'cyl', 'hp')
    
    mtcars <- mtcars %>%
      select(-one_of(undesired))
    

    Documentation can be found by running ?one_of or here:

    http://genomicsclass.github.io/book/pages/dplyr_tutorial.html

    0 讨论(0)
  • 2020-11-22 01:37

    Beyond select(-one_of(drop_col_names)) demonstrated in earlier answers, there are a couple other dplyr options for dropping columns using select() that do not involve defining all the specific column names (using the dplyr starwars sample data for some variety in column names):

    library(dplyr)
    starwars %>% 
      select(-(name:mass)) %>%        # the range of columns from 'name' to 'mass'
      select(-contains('color')) %>%  # any column name that contains 'color'
      select(-starts_with('bi')) %>%  # any column name that starts with 'bi'
      select(-ends_with('er')) %>%    # any column name that ends with 'er'
      select(-matches('^f.+s$')) %>%  # any column name matching the regex pattern
      select_if(~!is.list(.)) %>%     # not by column name but by data type
      head(2)
    
    # A tibble: 2 x 2
    homeworld species
      <chr>     <chr>  
    1 Tatooine  Human  
    2 Tatooine  Droid 
    

    If you need to drop a column that may or may not exist in the data frame, here's a slight twist using select_if() that unlike using one_of() will not throw an Unknown columns: warning if the column name does not exist. In this example 'bad_column' is not a column in the data frame:

    starwars %>% 
      select_if(!names(.) %in% c('height', 'mass', 'bad_column'))
    
    0 讨论(0)
  • 2020-11-22 01:39

    If you want remove the columns by reference and avoid the internal copying associated with data.frames then you can use the data.table package and the function :=

    You can pass a character vector names to the left hand side of the := operator, and NULL as the RHS.

    library(data.table)
    
    df <- data.frame(a=1:10, b=1:10, c=1:10, d=1:10)
    DT <- data.table(df)
    # or more simply  DT <- data.table(a=1:10, b=1:10, c=1:10, d=1:10) #
    
    DT[, c('a','b') := NULL]
    

    If you want to predefine the names as as character vector outside the call to [, wrap the name of the object in () or {} to force the LHS to be evaluated in the calling scope not as a name within the scope of DT.

    del <- c('a','b')
    DT <- data.table(a=1:10, b=1:10, c=1:10, d=1:10)
    DT[, (del) := NULL]
    DT <-  <- data.table(a=1:10, b=1:10, c=1:10, d=1:10)
    DT[, {del} := NULL]
    # force or `c` would also work.   
    

    You can also use set, which avoids the overhead of [.data.table, and also works for data.frames!

    df <- data.frame(a=1:10, b=1:10, c=1:10, d=1:10)
    DT <- data.table(df)
    
    # drop `a` from df (no copying involved)
    
    set(df, j = 'a', value = NULL)
    # drop `b` from DT (no copying involved)
    set(DT, j = 'b', value = NULL)
    
    0 讨论(0)
  • 2020-11-22 01:40

    There is a potentially more powerful strategy based on the fact that grep() will return a numeric vector. If you have a long list of variables as I do in one of my dataset, some variables that end in ".A" and others that end in ".B" and you only want the ones that end in ".A" (along with all the variables that don't match either pattern, do this:

    dfrm2 <- dfrm[ , -grep("\\.B$", names(dfrm)) ]
    

    For the case at hand, using Joris Meys example, it might not be as compact, but it would be:

    DF <- DF[, -grep( paste("^",drops,"$", sep="", collapse="|"), names(DF) )]
    
    0 讨论(0)
提交回复
热议问题