How to convert a Scikit-learn dataset to a Pandas dataset?

后端未结

关注

 22  1958

How do I convert data from a Scikit-learn Bunch object to a Pandas DataFrame?

from sklearn.datasets import load_iris
import pandas as pd
data = load_iris()
p


                      
              相关标签:


      
      
        
          22条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  温柔的废话        
                
              
                            
                2020-11-28 19:23
              
            
            
                                                                       
Took me 2 hours to figure this out

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

iris = load_iris()
##iris.keys()


df= pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                 columns= iris['feature_names'] + ['target'])

df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)


Get back the species for my pandas
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  夕颜        
                
              
                            
                2020-11-28 19:24
              
            
            
                                                                       
This is easy method worked for me.
boston = load_boston()
boston_frame = pd.DataFrame(data=boston.data, columns=boston.feature_names)
boston_frame["target"] = boston.target

But this can applied to load_iris as well.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  Happy的楠姐        
                
              
                            
                2020-11-28 19:24
              
            
            
                                                                       
I took couple of ideas from your answers and I don't know how to make it shorter :)

import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris['feature_names'])
df['target'] = iris['target']


This gives a Pandas DataFrame with feature_names plus target as columns and RangeIndex(start=0, stop=len(df), step=1).
I would like to have a shorter code where I can have 'target' added directly. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  挽巷        
                
              
                            
                2020-11-28 19:25
              
            
            
                                                                       
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
X = iris['data']
y = iris['target']
iris_df = pd.DataFrame(X, columns = iris['feature_names'])
iris_df.head()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  -上瘾入骨i        
                
              
                            
                2020-11-28 19:27
              
            
            
                                                                       
Whatever TomDLT answered it may not work for some of you because 

data1 = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                 columns= iris['feature_names'] + ['target'])


because iris['feature_names'] returns you a numpy array. In numpy array you can't add an array and a list ['target'] by just + operator. Hence you need to convert it into a list first and then add.

You can do 

data1 = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                 columns= list(iris['feature_names']) + ['target'])


This will work fine tho..
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光取名叫无心        
                
              
                            
                2020-11-28 19:27
              
            
            
                                                                       
One of the best ways:

data = pd.DataFrame(digits.data)


Digits is the sklearn dataframe and I converted it to a pandas DataFrame
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
3
4
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复