DataSet in Machine Learning
一、UCI Wine dataset: https://archive.ics.uci.edu/ml/datasets/Wine ,包含178个样本,每个样本包含13个与酒的化学特性的特征,标签有1,2,3,代表意大利不同地区生长的三种类型的葡萄 Breast Cancer Wisconsin dataset: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) , 包含569个样本,每个样本是良性或者恶性癌细胞,M代表恶性,B代表良性,并且每个样本还有30个特征,可以用来构建模型预测样本是恶性还是良性。 import pandas as pd df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data', header=None) from sklearn.preprocessing import LabelEncoder X = df.loc[:, 2:].values y = df.loc[:, 1].values le = LabelEncoder() y = le.fit_transform(y) le