How do I convert data from a Scikit-learn Bunch object to a Pandas DataFrame?
from sklearn.datasets import load_iris
import pandas as pd
data = load_iris()
p
Took me 2 hours to figure this out
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
##iris.keys()
df= pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target'])
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)
Get back the species for my pandas
This is easy method worked for me.
boston = load_boston()
boston_frame = pd.DataFrame(data=boston.data, columns=boston.feature_names)
boston_frame["target"] = boston.target
But this can applied to load_iris as well.
I took couple of ideas from your answers and I don't know how to make it shorter :)
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris['feature_names'])
df['target'] = iris['target']
This gives a Pandas DataFrame with feature_names plus target as columns and RangeIndex(start=0, stop=len(df), step=1). I would like to have a shorter code where I can have 'target' added directly.
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
X = iris['data']
y = iris['target']
iris_df = pd.DataFrame(X, columns = iris['feature_names'])
iris_df.head()
Whatever TomDLT answered it may not work for some of you because
data1 = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target'])
because iris['feature_names'] returns you a numpy array. In numpy array you can't add an array and a list ['target'] by just + operator. Hence you need to convert it into a list first and then add.
You can do
data1 = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= list(iris['feature_names']) + ['target'])
This will work fine tho..
One of the best ways:
data = pd.DataFrame(digits.data)
Digits is the sklearn dataframe and I converted it to a pandas DataFrame