How to drop columns by name pattern in R?

前端未结

关注

 5  1952

I have this dataframe:

state county city  region  mmatrix  X1 X2 X3    A1     A2     A3      B1     B2     B3      C1      C2      C3

  1      1     1


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  深忆病人        
                
              
                            
                2020-11-28 08:24
              
            
            
                                                                       
You can expand it further using regex for a broader pattern search.  I have a data frame that has a bunch of columns with "name", "upper_name"and"lower_name"` as they represent confidence intervals for a bunch of series, but I don't need them all.  So, using regex, you can do the following:

pattern = "(upper_[a-z]*)|(lower_[a-z]*)"
policyData <- policyData[, -grep(pattern = pattern, colnames(policyData))]


The "|" allows me to include an or statement in the regex so I can do it once with a single patter rather than look for each pattern.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  闹比i        
                
              
                            
                2020-11-28 08:27
              
            
            
                                                                       
For excluding any string you can use...

 # Search string to exclude
 strng <- "1"
 df <- data.frame(matrix(runif(25,max=10),nrow=5))
 colnames(df) <- paste( "EX" , 1:5 )
 df_red <- df[, -( grep(paste0( strng , "$" ) , colnames(df),perl = TRUE) ) ]

    df
#         EX 1     EX 2        EX 3     EX 4     EX 5
#   1 7.332913 4.972780 1.175947853 6.428073 8.625763
#   2 2.730271 3.734072 6.031157537 1.305951 8.012606
#   3 9.450122 3.259247 2.856123205 5.067294 7.027795
#   4 9.682430 5.295177 0.002015966 9.322912 7.424568
#   5 1.225359 1.577659 4.013616377 5.092042 5.130887

    df_red
#         EX 2        EX 3     EX 4     EX 5
#   1 4.972780 1.175947853 6.428073 8.625763
#   2 3.734072 6.031157537 1.305951 8.012606
#   3 3.259247 2.856123205 5.067294 7.027795
#   4 5.295177 0.002015966 9.322912 7.424568
#   5 1.577659 4.013616377 5.092042 5.130887

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情歌与酒        
                
              
                            
                2020-11-28 08:36
              
            
            
                                                                       
I found a simple answer using dplyr/tidyverse. If your colnames contain "This", then all variables containing "This" will be dropped.
library(tidyverse) 
df_new <- df %>% select(-contains("This"))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光取名叫无心        
                
              
                            
                2020-11-28 08:36
              
            
            
                                                                       
Just as an additional answer, since I stumbled across this, when looking for the data.table solution to this problem.

library(data.table)
dt <- data.table(df)
drop.cols <- grep("1$", colnames(dt))
dt[, (drop.cols) := NULL]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  暖寄归人        
                
              
                            
                2020-11-28 08:41
              
            
            
                                                                       
Your code works like a charm if I apply it to a minimal example and just search for the string "A":

df <- data.frame(ID = 1:10,
                 A1 = rnorm(10),
                 A2 = rnorm(10),
                 B1 = letters[1:10],
                 B2 = letters[11:20])
df[, -grep("A", colnames(df))]


So your problem is more a regular expression problem, not how to drop columns. If I run your code, I get an error:

df[, -grep("\\3$", colnames(df))]
Error in grep("\\3$", colnames(df)) : 
  invalid regular expression '\3$', reason 'Invalid back reference'


Update: Why don't you just use this following expression?

df[, -grep("1$", colnames(df))]
   ID         A2 B2
1   1  2.0957940  k
2   2 -1.7177042  l
3   3 -0.0448357  m
4   4  1.2899925  n
5   5  0.7569659  o
6   6 -0.5048024  p
7   7  0.6929080  q
8   8 -0.5116399  r
9   9 -1.2621066  s
10 10  0.7664955  t

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复