Tensorflow Invalid Argument: Assertation Failed [Label IDs must < n_classes]

匿名 (未验证) 提交于 2019-12-03 10:24:21

问题:

I get an error implementing a DNNClassifier in Tensorflow 1.3.0 with Python 2.7. I got the sample code from the Tensorflow tf.estimator Quickstart Tutorial and I want to run it with my own dataset: 3D coordinates and 10 different classes (int labels). Here is my implementation:

#!/usr/bin/env python # -*- coding: utf-8 -*-  def ReadLabels(file):     #load the labels from test file here     labelFile = open(file, "r")     Label = labelFile.readlines();     returnL = [[Label[i][j+1] for j in range(len(Label[0])-3)] for i in range(len(Label))]     returnLint = list();     for i in range(len(returnL)):         tmp = ''         for j in range(len(returnL[0])):             tmp += str(returnL[i][j])         returnLint.append(int(tmp))     return returnL, returnLint  def NumpyReadBin(file,numcols,type):     #load the data from binary file here     import numpy as np     trainData = np.fromfile(file,dtype=type)     numrows = len(trainData)/numcols     #print trainData[0:100]     result = [[trainData[i+j*numcols] for i in range(numcols)] for j in range(numrows)]     return result  def TensorflowDNN():     #load sample dataset     trainData = NumpyReadBin('data/TrainingData.dat',3,'float32')     valData = NumpyReadBin('data/ValidationData.dat',3,'float32')     testData = NumpyReadBin('data/TestingData.dat',3,'float32')     #load sample labels     trainL, trainLint = ReadLabels('data/TrainingLabels.txt')     validateL, validateLint = ReadLabels('data/ValidationLabels.txt')     testL, testLint = ReadLabels('data/TestingLabels.txt')      import tensorflow as tf     import numpy as np      #get unique labels     uniqueTrain = set()     for l in trainLint:         uniqueTrain.add(l)     uniqueTrain = list(uniqueTrain)     numClasses = len(uniqueTrain)     numDims = len(trainData[0])      #All features have real-value data     feature_columns = [tf.feature_column.numeric_column("x", shape=[3])]      # Build 3 layer DNN with 10, 20, 10 units respectively.     classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,                                               hidden_units=[10, 20, 10],                                               n_classes=numClasses,                                               model_dir="../Classification/tmp")      # Define training inputs     train_input_fn = tf.estimator.inputs.numpy_input_fn(                                                 x={"x": np.array(trainData)},y=np.array(trainLint),                                                 num_epochs = None, shuffle = True)      #Train the model     classifier.train(input_fn = train_input_fn, steps = 2000)      #Define Validation inputs     val_input_fn = tf.estimator.inputs.numpy_input_fn(                                                 x={"x": np.array(valData)},y=np.array(validateLint),                                                 num_epochs = 1, shuffle = False)      # Evaluate accuracy.     accuracy_score = classifier.evaluate(input_fn=val_input_fn)["accuracy"]     print("\nTest Accuracy: {0:f}\n".format(accuracy_score))  if __name__ == '__main__':     TensorflowDNN() 

The Functions RedLabels(...) and NumpyReadBin(...) are loading my saved dataset in tensors. Since the labels are integer numbers that I read from a text file the function is a bit weird, but what I get in the end is an array with integers from tese labels: [11, 12, 21, 22, 23, 31, 32, 33, 41, 42].

However I am not able to classify anything, because upon calling classifier.train(input_fn = train_input_fn, steps = 2000), I get the following error:

...Traceback and stuff like that... InvalidArgumentError (see above for traceback): assertion failed: [Label IDs must < n_classes] [Condition x < y did not hold element-wise:x (dnn/head/labels:0) = ] [[21][32][42]...] [y (dnn/head/assert_range/Const:0) = ] [10] [[Node: dnn/head/assert_range/assert_less/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](dnn/head/assert_range/assert_less/Assert/AssertGuard/Assert/Switch/_117, dnn/head/assert_range/assert_less/Assert/AssertGuard/Assert/data_0, dnn/head/assert_range/assert_less/Assert/AssertGuard/Assert/data_1, dnn/head/assert_range/assert_less/Assert/AssertGuard/Assert/Switch_1/_119, dnn/head/assert_range/assert_less/Assert/AssertGuard/Assert/data_3, dnn/head/assert_range/assert_less/Assert/AssertGuard/Assert/Switch_2/_121)]] 

Did anyone come across this error before or has an idea how to solve it? I guess it is somehow complaining about the number of classes/format of labels in my dataset, but I know that trainLint contains 10 different classlabels, and that is the value of numClasses. Could it be the format of my trainLint array?

回答1:

So the solution as Ishant Mrinal pointed out:

Tensorflow expects the integers from 0 up to the number of classes as class labels (range(0, num_classes)), not "arbitrary" numbers like in my case. Thanks!:)

...The other option I just came across is to add a label_vocabulary to the classifier-definition:

classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,                                           hidden_units=[10, 20, 10],                                           n_classes=numClasses,                                           model_dir=saveAt,                                           label_vocabulary=uniqueTrain) 

With this option I can define the labels like I had before, converted to strings.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!