scikit-learn

confusion matrix and classification report of StratifiedKFold

独自空忆成欢 提交于 2021-02-11 15:32:06
问题 I am using StratifiedKFold to checking the performance of my classifier. I have two classes and I trying to build Logistic Regression classier. Here is my code skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=0) for train_index, test_index in skf.split(x, y): x_train, x_test = x[train_index], x[test_index] y_train, y_test = y[train_index], y[test_index] tfidf = TfidfVectorizer() x_train = tfidf.fit_transform(x_train) x_test = tfidf.transform(x_test) clf = LogisticRegression(class

How to instantiate a Scikit-Learn linear model with known coefficients without fitting it

一笑奈何 提交于 2021-02-11 14:48:04
问题 Background I am testing various saved models as part of an experiment, but one of the models comes from an algorithm I wrote, not from a sklearn model-fitting. However, my custom model is still a linear model so I want to instantiate a LinearModel instance and set the coef_ and intercept_ attributes to the values from my custom fitting algorithm so I can use it for predictions. What I tried so far: from sklearn.linear_model import LinearRegression my_intercepts = np.ones(2) my_coefficients =

Should Cross Validation Score be performed on original or split data?

空扰寡人 提交于 2021-02-11 14:46:21
问题 When I want to evaluate my model with cross validation, should I perform cross validation on original (data thats not split on train and test) or on train / test data? I know that training data is used for fitting the model, and testing for evaluating. If I use cross validation, should I still split the data into train and test, or not? features = df.iloc[:,4:-1] results = df.iloc[:,-1] x_train, x_test, y_train, y_test = train_test_split(features, results, test_size=0.3, random_state=0) clf =

How to extract the boundary values from k-nearest neighbors predict

点点圈 提交于 2021-02-11 14:24:30
问题 How can only the boundary values be extracted , or returned, from .predict , for sklearn.neighbors.KNeighborsClassifier()? MRE import pandas as pd import numpy as np from sklearn.datasets import load_iris from sklearn.neighbors import KNeighborsClassifier import seaborn as sns import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap # prepare data iris = load_iris() X = iris.data y = iris.target df = pd.DataFrame(X, columns=iris.feature_names) df['label'] = y species_map =

tensorflow backend error. AttributeError: module 'tensorflow' has no attribute 'name_scope'

。_饼干妹妹 提交于 2021-02-11 13:51:19
问题 I'm using Version: 2.1.0 of TensorFlow and 2.3.1 of keras. While importing any module of keras i'm facing below tensorflow back-end error. import pandas as pd, numpy as np, os, re, json, math, time from keras.models import Sequential from keras.layers import Dense from keras.wrappers.scikit_learn import KerasRegressor from sklearn.model_selection import cross_val_score from sklearn.model_selection import KFold from sklearn.preprocessing import StandardScaler from sklearn.pipeline import

AttributeError: 'numpy.ndarray' object has no attribute 'id'

大憨熊 提交于 2021-02-11 13:42:08
问题 I am creating a sklearn pipeline that consists of 3 steps: Transforms pandas dataframe into 3D array Transforms 3D array into recurrence plot (image) Trains an image classification model using Keras This is my initial data set: train_df - pandas dataframe id cycle s1 1 1 0.05 1 2 0.04 1 3 0.05 1 4 0.05 2 1 0.02 2 2 0.03 y_train array([[1., 0., 0.], [1., 0., 0.], ... [1., 0., 0.]], dtype=float32) When I run my current code (see below), I get the following error: AttributeError: 'numpy.ndarray'

Attempting to fit a grid estimator, recieving TypeError : '<' not supported between instances of 'str' and 'int'

给你一囗甜甜゛ 提交于 2021-02-11 12:20:28
问题 I've been attempting to fit a Grid Search K Nearest neighbors Classifier, but am receiving the following Error message TypeError : '<' not supported between instances of 'str' and 'int' X_train compact sa area roofM3 h o glaz glazing_area_distribution 0 0.66 759.5 318.5 220.50 3.5 2 0.40 3 1 0.76 661.5 416.5 122.50 7.0 3 0.10 1 2 0.66 759.5 318.5 220.50 3.5 3 0.10 1 3 0.74 686.0 245.0 220.50 3.5 5 0.10 4 4 0.64 784.0 343.0 220.50 3.5 2 0.40 4 ... ... ... ... ... ... ... ... ... 609 0.98 514.5

How to plot a MDS from a similarity matrix?

强颜欢笑 提交于 2021-02-11 12:13:18
问题 I'm using a similarity matrix with values between 0 and 1 (1 means that the elements are equals), and I'm trying to plot a MDS with python and scikit-learn. I found multiple examples, but I'm not sure about what to give as an input to mds.fit(). For now, my data looks like that (file.csv) : ; A ; B ; C ; D ; E A ; 1 ; 0.1 ; 0.2 ; 0.5 ; 0.2 B ; 0.1 ; 1 ; 0.3 ; 1 ; 0 C ; 0.2 ; 0.3 ; 1 ; 0.8 ; 0.6 D ; 0.5 ; 1 ; 0.8 ; 1 ; 0.2 E ; 0.2 ; 0 ; 0.6 ; 0.2 ; 1 I'm currently using this code : import

Separate pandas dataframe using sklearn's KFold

心不动则不痛 提交于 2021-02-11 09:59:10
问题 I had obtained the index of training set and testing set with code below. df = pandas.read_pickle(filepath + filename) kf = KFold(n_splits = n_splits, shuffle = shuffle, random_state = randomState) result = next(kf.split(df), None) #train can be accessed with result[0] #test can be accessed with result[1] I wonder if there is any faster way to separate them into 2 dataframe respectively with the row indexes I retrieved. 回答1: You need DataFrame.iloc for select rows by positions: Sample : np

Separate pandas dataframe using sklearn's KFold

寵の児 提交于 2021-02-11 09:58:32
问题 I had obtained the index of training set and testing set with code below. df = pandas.read_pickle(filepath + filename) kf = KFold(n_splits = n_splits, shuffle = shuffle, random_state = randomState) result = next(kf.split(df), None) #train can be accessed with result[0] #test can be accessed with result[1] I wonder if there is any faster way to separate them into 2 dataframe respectively with the row indexes I retrieved. 回答1: You need DataFrame.iloc for select rows by positions: Sample : np