How to convert a Scikit-learn dataset to a Pandas dataset?

后端 未结 22 1933
清酒与你
清酒与你 2020-11-28 19:10

How do I convert data from a Scikit-learn Bunch object to a Pandas DataFrame?

from sklearn.datasets import load_iris
import pandas as pd
data = load_iris()
p         


        
相关标签:
22条回答
  • 2020-11-28 19:43

    Just as an alternative that I could wrap my head around much easier:

    data = load_iris()
    df = pd.DataFrame(data['data'], columns=data['feature_names'])
    df['target'] = data['target']
    df.head()
    

    Basically instead of concatenating from the get go, just make a data frame with the matrix of features and then just add the target column with data['whatvername'] and grab the target values from the dataset

    0 讨论(0)
  • 2020-11-28 19:43

    Otherwise use seaborn data sets which are actual pandas data frames:

    import seaborn
    iris = seaborn.load_dataset("iris")
    type(iris)
    # <class 'pandas.core.frame.DataFrame'>
    

    Compare with scikit learn data sets:

    from sklearn import datasets
    iris = datasets.load_iris()
    type(iris)
    # <class 'sklearn.utils.Bunch'>
    dir(iris)
    # ['DESCR', 'data', 'feature_names', 'filename', 'target', 'target_names']
    
    0 讨论(0)
  • 2020-11-28 19:43

    Working off the best answer and addressing my comment, here is a function for the conversion

    def bunch_to_dataframe(bunch):
      fnames = bunch.feature_names
      features = fnames.tolist() if isinstance(fnames, np.ndarray) else fnames
      features += ['target']
      return pd.DataFrame(data= np.c_[bunch['data'], bunch['target']],
                     columns=features)
    
    0 讨论(0)
  • 2020-11-28 19:44

    This snippet is only syntactic sugar built upon what TomDLT and rolyat have already contributed and explained. The only differences would be that load_iris will return a tuple instead of a dictionary and the columns names are enumerated.

    df = pd.DataFrame(np.c_[load_iris(return_X_y=True)])
    
    0 讨论(0)
提交回复
热议问题