How do I select variables in an R dataframe whose names contain a particular string?

前端 未结 3 911
臣服心动
臣服心动 2020-12-02 20:52

Two examples would be very helpful for me.

How would I select: 1) variables whose names start with b or B (i.e. case-insensitive) or 2) variables whose names contain

相关标签:
3条回答
  • 2020-12-02 21:18

    If you just want the variable names:

    grep("^[Bb]", names(df), value=TRUE)
    
    grep("3", names(df), value=TRUE)
    

    If you are wanting to select those columns, then either

    df[,grep("^[Bb]", names(df), value=TRUE)]
    df[,grep("^[Bb]", names(df))]
    

    The first uses selecting by name, the second uses selecting by a set of column numbers.

    0 讨论(0)
  • 2020-12-02 21:23

    I thought it was worth adding that select_vars is retired as of tidyverse version 1.2.1. Now, tidyselect::vars_select() is likely what you're looking for within the "tidyverse". See the documentation here.

    0 讨论(0)
  • 2020-12-02 21:32

    While I like the answer above, I wanted to give a "tidyverse" solution as well. If you are doing a lot of pipes and trying to do several things at once, as I often do, you may like this answer. Also, I find this code more "humanly" readable.

    The function tidyselect::vars_select will select variables from a character vector in the first argument, which should contain the names of the corresponding data frame, based on a select helper function like starts_with or matches

    library(dplyr)
    library(tidyselect)
    
    
    df <- data.frame(a1 = factor(c("Hi", "Med", "Hi", "Low"), 
                             levels = c("Low", "Med", "Hi"), ordered = TRUE),
                 a2 = c("A", "D", "A", "C"), a3 = c(8, 3, 9, 9),
                 b1 = c(1, 1, 1, 2), b2 = c( 5, 4, 3,2), b3 = c(3, 4, 3, 4),
                 B1 = c(3, 6, 4, 4))
    
    # will select the names starting with a "b" or a "B"
    tidyselect::vars_select(names(df), starts_with('b', ignore.case = TRUE)) 
    
    # use select in conjunction with the previous code
    df %>%
      select(vars_select(names(df), starts_with('b', ignore.case = TRUE)))
    
    # Alternatively
    tidyselect::vars_select(names(df), matches('^[Bb]'))
    

    Note that the default for ignore.case is TRUE, but I put it here to show explicitly, and in case future readers are curious how to adjust the code. The include and exclude arguments are also very useful. For example, you could use vars_select(names(df), matches('^[Bb]'), include = 'a1') if you wanted everything that starts with a "B" or a "b", and you wanted to include "a1" as well.

    0 讨论(0)
提交回复
热议问题