multilabel-classification

Multilabel Text Classification using TensorFlow

痞子三分冷 提交于 2019-12-03 00:53:41
问题 The text data is organized as vector with 20,000 elements, like [2, 1, 0, 0, 5, ...., 0]. i-th element indicates the frequency of the i-th word in a text. The ground truth label data is also represented as vector with 4,000 elements, like [0, 0, 1, 0, 1, ...., 0]. i-th element indicates whether the i-th label is a positive label for a text. The number of labels for a text differs depending on texts. I have a code for single-label text classification. How can I edit the following code for

How to manually specify class labels in keras flow_from_directory?

只谈情不闲聊 提交于 2019-12-02 20:41:31
Problem: I am training a model for multilabel image recognition. My images are therefore associated with multiple y labels. This is conflicting with the convenient keras method "flow_from_directory" of the ImageDataGenerator, where each image is supposed to be in the folder of the corresponding label ( https://keras.io/preprocessing/image/ ). Workaround: Currently, I am reading all images into a numpy array and use the "flow" function from there. But this results in heavy memory loads and a slow read-in process. Question: Is there a way to use the "flow_from_directory" method and to supply

Multilabel Text Classification using TensorFlow

不问归期 提交于 2019-12-02 14:18:07
The text data is organized as vector with 20,000 elements, like [2, 1, 0, 0, 5, ...., 0]. i-th element indicates the frequency of the i-th word in a text. The ground truth label data is also represented as vector with 4,000 elements, like [0, 0, 1, 0, 1, ...., 0]. i-th element indicates whether the i-th label is a positive label for a text. The number of labels for a text differs depending on texts. I have a code for single-label text classification. How can I edit the following code for multilabel text classification? Especially, I would like to know following points. How to compute accuracy

how many classes h2o deep learning algorithm accepts?

拜拜、爱过 提交于 2019-12-02 07:29:58
问题 I want to predict the response variable, and it has 700 classes. Deep learning model parameters from h2o.estimators import deeplearning dl_model = deeplearning.H2ODeepLearningEstimator( hidden=[200,200], epochs = 10, missing_values_handling='MeanImputation', max_categorical_features=4, distribution='multinomial' ) # Train the model dl_model.train(x = Content_vecs.names, y='tags', training_frame = data_split[0], validation_frame = data_split[1] ) Orginal Response Variable -Tags: apps, email,

inconsistent shape error MultiLabelBinarizer on y_test, sklearn multi-label classification

旧巷老猫 提交于 2019-12-02 04:41:54
import numpy as np import pandas as pd from sklearn.pipeline import Pipeline from sklearn.feature_extraction.text import CountVectorizer from sklearn.svm import LinearSVC from sklearn.linear_model import SGDClassifier from sklearn.feature_extraction.text import TfidfTransformer from sklearn.multiclass import OneVsRestClassifier from sklearn.metrics import accuracy_score, classification_report, confusion_matrix from sklearn.model_selection import train_test_split from sklearn import preprocessing from sklearn.svm import SVC data = r'C:\Users\...\Downloads\news_v1.xlsx' df = pd.read_excel(data)

how many classes h2o deep learning algorithm accepts?

核能气质少年 提交于 2019-12-02 04:39:01
I want to predict the response variable, and it has 700 classes. Deep learning model parameters from h2o.estimators import deeplearning dl_model = deeplearning.H2ODeepLearningEstimator( hidden=[200,200], epochs = 10, missing_values_handling='MeanImputation', max_categorical_features=4, distribution='multinomial' ) # Train the model dl_model.train(x = Content_vecs.names, y='tags', training_frame = data_split[0], validation_frame = data_split[1] ) Orginal Response Variable -Tags: apps, email, mail finance,freelancers,contractors,zen99 genomes gogovan brazil,china,cloudflare hauling,service

Sklearn - How to predict probability for all target labels

霸气de小男生 提交于 2019-12-01 18:19:44
I have a data set with a target variable that can have 7 different labels. Each sample in my training set has only one label for the target variable. For each sample, I want to calculate the probability for each of the target labels. So my prediction would consist of 7 probabilities for each row. On the sklearn website I read about multi-label classification, but this doesn't seem to be what I want. I tried the following code, but this only gives me one classification per sample. from sklearn.multiclass import OneVsRestClassifier clf = OneVsRestClassifier(DecisionTreeClassifier()) clf.fit(X

nolearn for multi-label classification

帅比萌擦擦* 提交于 2019-12-01 11:46:28
I tried to use DBN function imported from nolearn package, and here is my code: from nolearn.dbn import DBN import numpy as np from sklearn import cross_validation fileName = 'data.csv' fileName_1 = 'label.csv' data = np.genfromtxt(fileName, dtype=float, delimiter = ',') label = np.genfromtxt(fileName_1, dtype=int, delimiter = ',') clf = DBN( [data, 300, 10], learn_rates=0.3, learn_rate_decays=0.9, epochs=10, verbose=1, ) clf.fit(data,label) score = cross_validation.cross_val_score(clf, data, label,scoring='f1', cv=10) print score Since my data has the shape(1231, 229) and label with the shape

nolearn for multi-label classification

老子叫甜甜 提交于 2019-12-01 08:38:38
问题 I tried to use DBN function imported from nolearn package, and here is my code: from nolearn.dbn import DBN import numpy as np from sklearn import cross_validation fileName = 'data.csv' fileName_1 = 'label.csv' data = np.genfromtxt(fileName, dtype=float, delimiter = ',') label = np.genfromtxt(fileName_1, dtype=int, delimiter = ',') clf = DBN( [data, 300, 10], learn_rates=0.3, learn_rate_decays=0.9, epochs=10, verbose=1, ) clf.fit(data,label) score = cross_validation.cross_val_score(clf, data,

Precision/recall for multiclass-multilabel classification

喜夏-厌秋 提交于 2019-11-29 22:58:30
I'm wondering how to calculate precision and recall measures for multiclass multilabel classification, i.e. classification where there are more than two labels, and where each instance can have multiple labels? For multi-label classification you have two ways to go First consider the following. is the number of examples. is the ground truth label assignment of the example.. is the example. is the predicted labels for the example. Example based The metrics are computed in a per datapoint manner. For each predicted label its only its score is computed, and then these scores are aggregated over