multilabel-classification

Python Sci-Kit Learn : Multilabel Classification ValueError: could not convert string to float:

南笙酒味 提交于 2019-12-05 20:06:04
i am trying to do multilabel classification using sci-kit learn 0.17 my data looks like training Col1 Col2 asd dfgfg [1,2,3] poioi oiopiop [4] test Col1 asdas gwergwger rgrgh hrhrh my code so far import numpy as np from sklearn import svm, datasets from sklearn.metrics import precision_recall_curve from sklearn.metrics import average_precision_score from sklearn.cross_validation import train_test_split from sklearn.preprocessing import label_binarize from sklearn.multiclass import OneVsRestClassifier def getLabels(): traindf = pickle.load(open("train.pkl","rb")) X = traindf['Col1'] y = traindf

Keras Multilabel Multiclass Individual Tag Accuracy

我怕爱的太早我们不能终老 提交于 2019-12-05 14:49:21
I'm trying to perform a multiclass multilabel classification with a CNN in Keras. I've attempted to create an individual label accuracy function based on this function from a similar question The relevant code I have attempted is: labels = ["dog", "mammal", "cat", "fish", "rock"] #I have more interesting_id = [0]*len(labels) interesting_id[labels.index("rock")] = 1 #we only care about rock's accuracy interesting_label = K.variable(np.array(interesting_label), dtype='float32') def single_class_accuracy(interesting_class_id): def single(y_true, y_pred): class_id_true = K.argmax(y_true, axis=-1)

Imbalanced Dataset for Multi Label Classification

旧城冷巷雨未停 提交于 2019-12-05 11:28:12
So I trained a deep neural network on a multi label dataset I created (about 20000 samples). I switched softmax for sigmoid and try to minimize (using Adam optimizer) : tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(labels=y_, logits=y_pred) And I end up with this king of prediction (pretty "constant") : Prediction for Im1 : [ 0.59275776 0.08751075 0.37567005 0.1636796 0.42361438 0.08701646 0.38991812 0.54468459 0.34593087 0.82790571] Prediction for Im2 : [ 0.52609032 0.07885984 0.45780018 0.04995904 0.32828355 0.07349177 0.35400775 0.36479294 0.30002621 0.84438241] Prediction for Im3

how to use SIFT features for bag of words in opencv?

风格不统一 提交于 2019-12-04 09:45:45
I have read a lot of articles about implementing bag of words after taking sift features of an image, but I'm still confused what to do next. What do i specifically do? Thank you so much in advance for the guidance. This is the code that i have so far. cv::Mat mat_img = cropped.clone(); Mat grayForML; cvtColor(mat_img, grayForML, CV_BGR2GRAY); IplImage grayImageForML = grayForML.operator IplImage(); //create another copy of iplGray IplImage *input = cvCloneImage(&grayImageForML); Mat matInput = cvarrToMat(input); // Mat matInput = copy_gray.clone(); cv::SiftFeatureDetector detector; std:

Multi-labels using two different LMDB

人盡茶涼 提交于 2019-12-04 08:36:24
I am new in caffe framework and I would like to use caffe to implement the training with multi-label. I use two LMDB to save data and labels, respectively. The data LMDB is of dimension Nx1xHxW while the label LMDB is of dimension Nx1x1x3. Labels are float data. The text file is as follow: 5911 3 train/train_data/4224.bmp 13 0 12 train/train_data/3625.bmp 11 3 7 ... ... I use C++ to create LMDB. My main.cpp: #include <algorithm> #include <fstream> // NOLINT(readability/streams) #include <string> #include <utility> #include <vector> #include <QImage> #include "boost/scoped_ptr.hpp" #include

UserWarning: Label not :NUMBER: is present in all training examples

偶尔善良 提交于 2019-12-04 03:58:34
I am doing multilabel classification, where I try to predict correct labels for each document and here is my code: mlb = MultiLabelBinarizer() X = dataframe['body'].values y = mlb.fit_transform(dataframe['tag'].values) classifier = Pipeline([ ('vectorizer', CountVectorizer(lowercase=True, stop_words='english', max_df = 0.8, min_df = 10)), ('tfidf', TfidfTransformer()), ('clf', OneVsRestClassifier(LinearSVC()))]) predicted = cross_val_predict(classifier, X, y) When running my code I get multiple warnings: UserWarning: Label not :NUMBER: is present in all training examples. When I print out

Scikit Learn Multilabel Classification: ValueError: You appear to be using a legacy multi-label data representation

被刻印的时光 ゝ 提交于 2019-12-03 16:32:22
问题 i am trying to use scikit learn 0.17 with anaconda 2.7 for a multilabel classification problem. here is my code import pandas as pd import pickle import re from sklearn.cross_validation import train_test_split from sklearn.metrics.metrics import classification_report, accuracy_score, confusion_matrix from nltk.stem import WordNetLemmatizer from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.naive_bayes import MultinomialNB as MNB from sklearn.pipeline import Pipeline from

Which decision_function_shape for sklearn.svm.SVC when using OneVsRestClassifier?

£可爱£侵袭症+ 提交于 2019-12-03 08:40:45
I am doing multi-label classification where I am trying to predict correct tags to questions: (X = questions, y = list of tags for each question from X). I am wondering, which decision_function_shape for sklearn.svm.SVC should be be used with OneVsRestClassifier ? From docs we can read that decision_function_shape can have two values 'ovo' and 'ovr' : decision_function_shape : ‘ovo’, ‘ovr’ or None, default=None Whether to return a one-vs-rest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one (‘ovo’) decision function of libsvm which

How to manually specify class labels in keras flow_from_directory?

蹲街弑〆低调 提交于 2019-12-03 08:00:42
问题 Problem: I am training a model for multilabel image recognition. My images are therefore associated with multiple y labels. This is conflicting with the convenient keras method "flow_from_directory" of the ImageDataGenerator, where each image is supposed to be in the folder of the corresponding label (https://keras.io/preprocessing/image/). Workaround: Currently, I am reading all images into a numpy array and use the "flow" function from there. But this results in heavy memory loads and a

Scikit Learn Multilabel Classification: ValueError: You appear to be using a legacy multi-label data representation

空扰寡人 提交于 2019-12-03 06:47:50
i am trying to use scikit learn 0.17 with anaconda 2.7 for a multilabel classification problem. here is my code import pandas as pd import pickle import re from sklearn.cross_validation import train_test_split from sklearn.metrics.metrics import classification_report, accuracy_score, confusion_matrix from nltk.stem import WordNetLemmatizer from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.naive_bayes import MultinomialNB as MNB from sklearn.pipeline import Pipeline from sklearn.grid_search import GridSearchCV traindf = pickle.load(open("train.pkl","rb")) X, y = traindf[