Difference between contains() and matches() for select() in dplyr

前端 未结 1 402
走了就别回头了
走了就别回头了 2021-01-03 11:57

I have decided to spend some time to learn dplyr thoroughly. I have just come across the select() function and some of the helper functions that come with it. <

相关标签:
1条回答
  • 2021-01-03 12:33

    The difference is that matches can take regex as pattern to match column names and select while contains does the literal match of substring or full name match. It is described in the ?select_helpers as

    contains(): Contains a literal string.

    matches(): Matches a regular expression.

    Consider a simple example where we want to select columns that have substring 'col'

    df1 <- data.frame(colnm = 1:5, col1 = 24, col2 = 46)
    df1 %>% 
        select(contains("col"))
    #  colnm col1 col2
    #1     1   24   46
    #2     2   24   46
    #3     3   24   46
    #4     4   24   46
    #5     5   24   46
    

    Here, it matches the 'col' literally in the column names and select those. If we change the matching criteria to match 'col' followed by one or more digits (\\d+) with a regex

    df1 %>% 
       select(contains("col\\d+"))
    #data frame with 0 columns and 5 rows
    

    if fails, because it is looking for column name substring "col\\d+"

    df1 %>%
        select(matches("col\\d+")) 
    # col1 col2
    #1   24   46
    #2   24   46
    #3   24   46
    #4   24   46
    #5   24   46
    

    whereas matches take regex and match those patterns

    0 讨论(0)
提交回复
热议问题