Selecting multiple columns in a pandas dataframe

后端未结

关注

 19  1806

I have data in different columns but I don\'t know how to extract it to save it in another variable.

index  a   b   c
1      2   3   4
2      3   4   5


                      
              相关标签:


      
      
        
          19条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  青春惊慌失措        
                
              
                            
                2020-11-22 00:29
              
            
            
                                                                       
One different and easy approach : iterating rows
using iterows
 df1= pd.DataFrame() #creating an empty dataframe
 for index,i in df.iterrows():
    df1.loc[index,'A']=df.loc[index,'A']
    df1.loc[index,'B']=df.loc[index,'B']
    df1.head()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  Happy的楠姐        
                
              
                            
                2020-11-22 00:32
              
            
            
                                                                       
Assuming your column names (df.columns) are ['index','a','b','c'], then the data you want is in the 
3rd & 4th columns. If you don't know their names when your script runs, you can do this

newdf = df[df.columns[2:4]] # Remember, Python is 0-offset! The "3rd" entry is at slot 2.


As EMS points out in his answer, df.ix slices columns a bit more concisely, but the .columns slicing interface might be more natural because it uses the vanilla 1-D python list indexing/slicing syntax.

WARN: 'index' is a bad name for a DataFrame column. That same label is also used for the real df.index attribute, a Index array. So your column is returned by df['index'] and the real DataFrame index is returned by df.index. An Index is a special kind of Series optimized for lookup of it's elements' values. For df.index it's for looking up rows by their label. That df.columns attribute is also a pd.Index array, for looking up columns by their labels.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  既然无缘        
                
              
                            
                2020-11-22 00:32
              
            
            
                                                                       
You could provide a list of columns to be dropped and return back the DataFrame with only the columns needed using the drop() function on a Pandas DataFrame.

Just saying

colsToDrop = ['a']
df.drop(colsToDrop, axis=1)


would return a DataFrame with just the columns b and c.

The drop method is documented here.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  北海茫月        
                
              
                            
                2020-11-22 00:34
              
            
            
                                                                       
The column names (which are strings) cannot be sliced in the manner you tried.
Here you have a couple of options. If you know from context which variables you want to slice out, you can just return a view of only those columns by passing a list into the __getitem__ syntax (the []'s).
df1 = df[['a', 'b']]

Alternatively, if it matters to index them numerically and not by their name (say your code should automatically do this without knowing the names of the first two columns) then you can do this instead:
df1 = df.iloc[:, 0:2] # Remember that Python does not slice inclusive of the ending index.

Additionally, you should familiarize yourself with the idea of a view into a Pandas object vs. a copy of that object. The first of the above methods will return a new copy in memory of the desired sub-object (the desired slices).
Sometimes, however, there are indexing conventions in Pandas that don't do this and instead give you a new variable that just refers to the same chunk of memory as the sub-object or slice in the original object. This will happen with the second way of indexing, so you can modify it with the copy() function to get a regular copy. When this happens, changing what you think is the sliced object can sometimes alter the original object. Always good to be on the look out for this.
df1 = df.iloc[0, 0:2].copy() # To avoid the case where changing df1 also changes df

To use iloc, you need to know the column positions (or indices). As the column positions may change, instead of hard-coding indices, you can use iloc along with get_loc function of columns method of dataframe object to obtain column indices.
{df.columns.get_loc(c): c for idx, c in enumerate(df.columns)}

Now you can use this dictionary to access columns through names and using iloc.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2020-11-22 00:35
              
            
            
                                                                       
You can use pandas.DataFrame.filter method to either filter or reorder columns like this:
df1 = df.filter(['a', 'b'])

This is also very useful when you are chaining methods.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉话见心        
                
              
                            
                2020-11-22 00:35
              
            
            
                                                                       
df[['a', 'b']]  # select all rows of 'a' and 'b'column 
df.loc[0:10, ['a', 'b']]  # index 0 to 10 select column 'a' and 'b'
df.loc[0:10, 'a':'b']  # index 0 to 10 select column 'a' to 'b'
df.iloc[0:10, 3:5]  # index 0 to 10 and column 3 to 5
df.iloc[3, 3:5]  # index 3 of column 3 to 5

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     上一页
1
2
3
4
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复