How do I convert data from a Scikit-learn Bunch object to a Pandas DataFrame?
from sklearn.datasets import load_iris
import pandas as pd
data = load_iris()
p
Just as an alternative that I could wrap my head around much easier:
data = load_iris()
df = pd.DataFrame(data['data'], columns=data['feature_names'])
df['target'] = data['target']
df.head()
Basically instead of concatenating from the get go, just make a data frame with the matrix of features and then just add the target column with data['whatvername'] and grab the target values from the dataset
Otherwise use seaborn data sets which are actual pandas data frames:
import seaborn
iris = seaborn.load_dataset("iris")
type(iris)
# <class 'pandas.core.frame.DataFrame'>
Compare with scikit learn data sets:
from sklearn import datasets
iris = datasets.load_iris()
type(iris)
# <class 'sklearn.utils.Bunch'>
dir(iris)
# ['DESCR', 'data', 'feature_names', 'filename', 'target', 'target_names']
Working off the best answer and addressing my comment, here is a function for the conversion
def bunch_to_dataframe(bunch):
fnames = bunch.feature_names
features = fnames.tolist() if isinstance(fnames, np.ndarray) else fnames
features += ['target']
return pd.DataFrame(data= np.c_[bunch['data'], bunch['target']],
columns=features)
This snippet is only syntactic sugar built upon what TomDLT and rolyat have already contributed and explained. The only differences would be that load_iris
will return a tuple instead of a dictionary and the columns names are enumerated.
df = pd.DataFrame(np.c_[load_iris(return_X_y=True)])