How to select range of columns in a dataframe based on their name and not their indexes?

前端未结

关注

 5  1535

In a pandas dataframe created like this:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(10, size=(6, 6)),
                  colu


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  挽巷        
                
              
                            
                2020-12-21 10:21
              
            
            
                                                                       
A solution using dplyr package but you need to specify the row you want to select before hand

rowName2Match <- c("r1", "r5")

df1 <- df %>% 
  select(matches("2"):matches("4")) %>% 
  add_rownames() %>% 
  mutate(idRow = match(rowname, rowName2Match)) %>% 
  slice(which(!is.na(idRow))) %>% 
  select(-idRow)
df1

> df1
Source: local data frame [2 x 4]

  rowname    c2    c3    c4
   <chr> <int> <int> <int>
1      r1     2     3     4
2      r5     6     7     8

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉酒成梦        
                
              
                            
                2020-12-21 10:30
              
            
            
                                                                       
This seems way too easy so perhaps I'm doing something wrong.

df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11,
                 row.names=c('r1', 'r2', 'r3', 'r4', 'r5', 'r6'))


df[c('r1','r2'),c('c1','c2')]

   c1 c2
r1  1  2
r2  2  3

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一向        
                
              
                            
                2020-12-21 10:32
              
            
            
                                                                       
Adding onto @evan058's answer:

subset(df[rownames(df) %in% c("r3", "r4", "r5"),], select=c1:c4)

c1 c2 c3 c4
r3  3  4  5  6
r4  4  5  6  7
r5  5  6  7  8


But note, the : operator will probably not work here; you will have to write out the name of each row you want to include explicitly. It might be easier to group by a particular value of one of your other columns or to create an index column as @evan058 mentioned in comments.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  灰色年华        
                
              
                            
                2020-12-21 10:33
              
            
            
                                                                       
An alternative approach to subset if you don't mind to work with data.table would be:

data.table::setDT(df)
df[1:3, c2:c4, with=F]
   c2 c3 c4
1:  2  3  4
2:  3  4  5
3:  4  5  6


This still does not solve the problem of subsetting row range though.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  粉色の甜心        
                
              
                            
                2020-12-21 10:35
              
            
            
                                                                       
It looks like you can accomplish this with a subset:

> df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11)
> rownames(df) <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6')
> subset(df, select=c1:c4)
   c1 c2 c3 c4
r1  1  2  3  4
r2  2  3  4  5
r3  3  4  5  6
r4  4  5  6  7
r5  5  6  7  8
r6  6  7  8  9
> subset(df, select=c1:c2)
   c1 c2
r1  1  2
r2  2  3
r3  3  4
r4  4  5
r5  5  6
r6  6  7


If you want to subset by row name range, this hack would do:

> gRI <- function(df, rName) {which(match(rNames, rName) == 1)}
> df[gRI(df,"r2"):gRI(df,"r4"),]
   c1 c2 c3 c4 c5 c6
r2  2  3  4  5  6  7
r3  3  4  5  6  7  8
r4  4  5  6  7  8  9

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复