How to convert a Scikit-learn dataset to a Pandas dataset?

后端 未结 22 1960
清酒与你
清酒与你 2020-11-28 19:10

How do I convert data from a Scikit-learn Bunch object to a Pandas DataFrame?

from sklearn.datasets import load_iris
import pandas as pd
data = load_iris()
p         


        
相关标签:
22条回答
  • 2020-11-28 19:39
    from sklearn.datasets import load_iris
    import pandas as pd
    
    data = load_iris()
    df = pd.DataFrame(data.data, columns=data.feature_names)
    df.head()
    

    This tutorial maybe of interest: http://www.neural.cz/dataset-exploration-boston-house-pricing.html

    0 讨论(0)
  • 2020-11-28 19:39

    Other way to combine features and target variables can be using np.column_stack (details)

    import numpy as np
    import pandas as pd
    from sklearn.datasets import load_iris
    
    data = load_iris()
    df = pd.DataFrame(np.column_stack((data.data, data.target)), columns = data.feature_names+['target'])
    print(df.head())
    

    Result:

       sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)     target
    0                5.1               3.5                1.4               0.2     0.0
    1                4.9               3.0                1.4               0.2     0.0 
    2                4.7               3.2                1.3               0.2     0.0 
    3                4.6               3.1                1.5               0.2     0.0
    4                5.0               3.6                1.4               0.2     0.0
    

    If you need the string label for the target, then you can use replace by convertingtarget_names to dictionary and add a new column:

    df['label'] = df.target.replace(dict(enumerate(data.target_names)))
    print(df.head())
    

    Result:

       sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)     target  label 
    0                5.1               3.5                1.4               0.2     0.0     setosa
    1                4.9               3.0                1.4               0.2     0.0     setosa
    2                4.7               3.2                1.3               0.2     0.0     setosa
    3                4.6               3.1                1.5               0.2     0.0     setosa
    4                5.0               3.6                1.4               0.2     0.0     setosa
    
    0 讨论(0)
  • 2020-11-28 19:39

    The API is a little cleaner than the responses suggested. Here, using as_frame and being sure to include a response column as well.

    import pandas as pd
    from sklearn.datasets import load_wine
    
    features, target = load_wine(as_frame=True).data, load_wine(as_frame=True).target
    df = features
    df['target'] = target
    
    df.head(2)
    
    0 讨论(0)
  • 2020-11-28 19:40

    Basically what you need is the "data", and you have it in the scikit bunch, now you need just the "target" (prediction) which is also in the bunch.

    So just need to concat these two to make the data complete

      data_df = pd.DataFrame(cancer.data,columns=cancer.feature_names)
      target_df = pd.DataFrame(cancer.target,columns=['target'])
    
      final_df = data_df.join(target_df)
    
    0 讨论(0)
  • 2020-11-28 19:41

    Manually, you can use pd.DataFrame constructor, giving a numpy array (data) and a list of the names of the columns (columns). To have everything in one DataFrame, you can concatenate the features and the target into one numpy array with np.c_[...] (note the []):

    import numpy as np
    import pandas as pd
    from sklearn.datasets import load_iris
    
    # save load_iris() sklearn dataset to iris
    # if you'd like to check dataset type use: type(load_iris())
    # if you'd like to view list of attributes use: dir(load_iris())
    iris = load_iris()
    
    # np.c_ is the numpy concatenate function
    # which is used to concat iris['data'] and iris['target'] arrays 
    # for pandas column argument: concat iris['feature_names'] list
    # and string list (in this case one string); you can make this anything you'd like..  
    # the original dataset would probably call this ['Species']
    data1 = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                         columns= iris['feature_names'] + ['target'])
    
    0 讨论(0)
  • 2020-11-28 19:42
    from sklearn.datasets import load_iris
    import pandas as pd
    
    iris_dataset = load_iris()
    
    datasets = pd.DataFrame(iris_dataset['data'], columns = 
               iris_dataset['feature_names'])
    target_val = pd.Series(iris_dataset['target'], name = 
                'target_values')
    
    species = []
    for val in target_val:
        if val == 0:
            species.append('iris-setosa')
        if val == 1:
            species.append('iris-versicolor')
        if val == 2:
            species.append('iris-virginica')
    species = pd.Series(species)
    
    datasets['target'] = target_val
    datasets['target_name'] = species
    datasets.head()
    
    0 讨论(0)
提交回复
热议问题