How to convert a Scikit-learn dataset to a Pandas dataset?

后端 未结 22 1958
清酒与你
清酒与你 2020-11-28 19:10

How do I convert data from a Scikit-learn Bunch object to a Pandas DataFrame?

from sklearn.datasets import load_iris
import pandas as pd
data = load_iris()
p         


        
相关标签:
22条回答
  • 2020-11-28 19:23

    Took me 2 hours to figure this out

    import numpy as np
    import pandas as pd
    from sklearn.datasets import load_iris
    
    iris = load_iris()
    ##iris.keys()
    
    
    df= pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= iris['feature_names'] + ['target'])
    
    df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
    

    Get back the species for my pandas

    0 讨论(0)
  • 2020-11-28 19:24

    This is easy method worked for me.

    boston = load_boston()
    boston_frame = pd.DataFrame(data=boston.data, columns=boston.feature_names)
    boston_frame["target"] = boston.target
    

    But this can applied to load_iris as well.

    0 讨论(0)
  • 2020-11-28 19:24

    I took couple of ideas from your answers and I don't know how to make it shorter :)

    import pandas as pd
    from sklearn.datasets import load_iris
    iris = load_iris()
    df = pd.DataFrame(iris.data, columns=iris['feature_names'])
    df['target'] = iris['target']
    

    This gives a Pandas DataFrame with feature_names plus target as columns and RangeIndex(start=0, stop=len(df), step=1). I would like to have a shorter code where I can have 'target' added directly.

    0 讨论(0)
  • 2020-11-28 19:25
    import pandas as pd
    from sklearn.datasets import load_iris
    iris = load_iris()
    X = iris['data']
    y = iris['target']
    iris_df = pd.DataFrame(X, columns = iris['feature_names'])
    iris_df.head()
    
    0 讨论(0)
  • 2020-11-28 19:27

    Whatever TomDLT answered it may not work for some of you because

    data1 = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= iris['feature_names'] + ['target'])
    

    because iris['feature_names'] returns you a numpy array. In numpy array you can't add an array and a list ['target'] by just + operator. Hence you need to convert it into a list first and then add.

    You can do

    data1 = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= list(iris['feature_names']) + ['target'])
    

    This will work fine tho..

    0 讨论(0)
  • 2020-11-28 19:27

    One of the best ways:

    data = pd.DataFrame(digits.data)
    

    Digits is the sklearn dataframe and I converted it to a pandas DataFrame

    0 讨论(0)
提交回复
热议问题